Topic

crawler

Repositories (1232)

bitextor
bitextor bitextor Python

Bitextor generates translation memories from multilingual websites

255
Tumblr_Crawler
Tumblr_Crawler sparrow629 Python

This is a Multi-thread crawler for Tumblr.

251
FileSensor
FileSensor Xyntax Python

Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具

250
chromium_for_spider
chromium_for_spider myvyang HTML

dynamic crawler for web vulnerability scanner

250
Sub
Sub Leon406 Kotlin

节点爬取,筛选, 支持Clash,base64订阅解析,自动生成可用的ss, ssr, v2ray, trojan节点. 已集成Github Action,每天8-24,定时更新.

249
ZhihuSpider
ZhihuSpider kong36088 Python

多线程知乎用户爬虫,基于python3

248
D4N155
D4N155 OWASP Shell

OWASP D4N155 - Intelligent and dynamic wordlist using OSINT

247
woid
woid vitorfs Python

Simple news aggregator displaying top stories in real time

246
web-page-monitor
web-page-monitor lgh06 JavaScript

Web Site Page Changes Monitor. 网站网页页面更新变更监控提醒。

242
4chan-downloader
4chan-downloader Exceen Python

Python3 script to continuously download all images/webms of multiple 4chan thread simultaneously - without installation

237
wscan
wscan chushuai Go

一款开源的安全评估工具支持常见的 web 安全问题扫描和自定义 POC。此外,该工具还具备机器学习的漏洞检测和自动化测试功能。

236
RuiJi.Net
RuiJi.Net zhupingqi C#

crawler framework, distributed crawler extractor

234
goose-parser
goose-parser redco JavaScript

Universal scraping tool, which allows you to extract data using multiple environments

229
js-reverse
js-reverse freedom-wy HTML

JS逆向研究

227
EmailFinder
EmailFinder Josue87 Python

Search emails from a domain through search engines

226
weibo-topic-spider
weibo-topic-spider czy1999 Python

微博超级话题爬虫,微博词频统计+情感分析+简单分类,新增肺炎超话爬取数据

224
FooProxy
FooProxy 01ly Python

稳健高效的评分制-针对性- IP代理池 + API服务,可以自己插入采集器进行代理IP的爬取,针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库,支持Mongo...

222
WebVideoBot
WebVideoBot tim232385 Java

Web crawler.

221
black-widow
black-widow offensive-hub Python

GUI based offensive penetration testing tool (Open Source)

221
91porn-crawler
91porn-crawler blue-troy Java

91 porn crawler. 自动爬取并下载你想要的91porn热门视频。Automatically download your "favorite" 91porn hot movies.

220
google-group-crawler
google-group-crawler icy Shell

[Deprecated] Get (almost) original messages from google group archives. Your data is yours.

219
Sitemap-Generator-Crawler
Sitemap-Generator-Crawler vezaynk PHP

PHP script to recursively crawl websites and generate a sitemap. Zero dependencies.

219
N2H4
N2H4 forkonlp R

네이버 뉴스 수집을 위한 도구

217
gf-secrets
gf-secrets dwisiswant0 Shell

Secret and/or credential patterns used for gf.

213
InfinityCrawler
InfinityCrawler TurnerSoftware C#

A simple but powerful web crawler library for .NET

213
goribot
goribot zhshch2002 Go

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

211
ptt-alertor
ptt-alertor Ptt-Alertor Go

:loudspeaker: Ptt 文章通知機器人!Notify Ptt Article in Realtime

210
indonesian-NLP-resources
indonesian-NLP-resources kirralabs

data resource untuk NLP bahasa indonesia

209
AllNewsSpider
AllNewsSpider Python3Spiders Python

澎湃新闻,新浪新闻,腾讯新闻,搜狐新闻,新闻联播,泰晤士报,纽约时报,BBCNews,旨在爬取所有新闻门户网站的新闻,禁止将所得数据商用!

209
scrapedin-linkedin-crawler
scrapedin-linkedin-crawler linkedtales JavaScript

Crawler for LinkedIn full profiles 2019

207
news-crawl
news-crawl commoncrawl Java

News crawling with StormCrawler - stores content as WARC

207
laosj
laosj songtianyi Go

golang light-weight image crawler

206
crawlab-lite
crawlab-lite crawlab-team Vue

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

206
facebook-data-extraction
facebook-data-extraction 18520339 Python

Experience for effectively fetching Facebook data by Querying Graph API with Account-based Token and Operating undetectable scraping Bots to extract C...

205
KoreaNewsCrawler
KoreaNewsCrawler lumyjuwon Python

대량의 뉴스 데이터를 수집하기 위해 만들어진 뉴스 크롤러입니다.

202
dorkscout
dorkscout R4yGM Go

DorkScout - Golang tool to automate google dork scan against the entiere internet or specific targets

201
VideoServer
VideoServer GF-Allen JavaScript

以Node.js基于express以及爬虫实现的视频资源后端

200
weibo_wordcloud
weibo_wordcloud gaussic Python

根据关键词抓取微博数据,再生成词云

197
galer
galer dwisiswant0 Go

A fast tool to fetch URLs from HTML attributes by crawl-in.

197
robots-txt
robots-txt spatie PHP

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers

196
NoSmoke
NoSmoke macacajs JavaScript

A cross platform UI crawler which scans view trees then generate and execute UI test cases.

195
web-bee
web-bee codesofun Java

🐝 Web vertical crawler framework for fun

194
JavPy
JavPy TheodoreKrypton JavaScript

Enjoy driving on a Javascriptive (originally Pythonic) way to Japanese AV!

190
crawler-js-hook-framework-public
crawler-js-hook-framework-public JSREI JavaScript

JS逆向Hook工具集,开源部分工具到这里

190
awesome-python-primer
awesome-python-primer zkqiang Python

自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向

190
crawler_shopee_public
crawler_shopee_public hsuanchi Python

蝦皮非同步爬蟲 + 競品賣家分析

189
instagram-crawler
instagram-crawler mgleon08 Ruby

Crawl instagram photos, posts and videos for download.

188
digger
digger hetianyi Go

Digger is a powerful and flexible web crawler implemented by pure golang

187
zhihu_fun
zhihu_fun AnyISalIn JavaScript

基于 Selenium 的知乎关键词爬虫

186
CSharpCrawler
CSharpCrawler zhaotianff C#

C#爬虫示例程序,想学习爬虫入门知识的可以看过来。后续会慢慢加入更多爬虫相关的知识。

186