Most popular crawler repositories and open source projects

Python3-Spider

Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团...

972   2491   2491  

weibo-crawler

新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频

646   2479   2479  

lianjia-beike-spider

链家网和贝壳网房价爬虫,采集北京上海广州深圳等21个中国主要城市的房价数...

645   2459   2459  

work_crawler

Download comics novels 小说漫画下载工具 小説漫画のダウンローダ 小說漫...

289   2451   2451  

owllook

owllook-小说搜索引擎

740   2426   2426  

grab

Web Scraping Framework

274   2403   2403  

crawler

An easy to use, powerful crawler implemented in PHP. Can execute Java...

342   2362   2362  

abot

Cross Platform C# web crawler framework built for speed and flexibilit...

561   2273   2273  

gain

Web crawling framework based on asyncio.

212   2022   2022  

gocrawl

Polite, slim and concurrent web crawler.

196   2015   2015  

DXY-COVID-19-Crawler

2019新型冠状病毒疫情实时爬虫及API | COVID-19/2019-nCoV Realtime Infect...

403   2012   2012  

google-play-scraper

Node.js scraper to get data from Google Play

583   2009   2009  

rendora

Dynamic server-side rendering using headless Chrome

106   1994   1994  

vulnx

vulnx 🕷️ an intelligent Bot, Shell can achieve automatic injection, an...

343   1920   1920  

feapder

🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder...

350   1899   1899  

go_spider

[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework...

470   1827   1827  

FinalRecon

The Last Web Recon Tool You'll Need

381   1815   1815  

Crawler-Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via...

234   1785   1785  

PSpider

简单易用的Python爬虫框架,QQ交流群:597510560

516   1780   1780  

xalpha

基金投资管理回测引擎

478   1756   1756  

bilix

⚡️Lightning-fast async download tool for bilibili and more

174   1741   1741  

cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, ap...

175   1727   1727  

x-crawl

Flexible Node.js AI-assisted crawler library

108   1720   1720  

skycaiji

蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运...

556   1694   1694  

ruia

Async Python 3.6+ web scraping micro-framework based on asyncio

180   1688   1688  

news-please

news-please - an integrated web crawler and information extractor for...

379   1655   1655  

spider

A web crawler and scraper for Rust

135   1611   1611  

CatVodTVSpider

930   1587   1587  

scrapoxy

Scrapoxy hides your scraper behind a cloud. It starts a pool of proxie...

222   1570   1570  

lightcrawler

Crawl a website and run it through Google lighthouse

165   1474   1474  

dirhunt

Find web directories without bruteforce

211   1462   1462  

goclone

Website Cloner - Utilizes powerful Go routines to clone websites to y...

301   1456   1456  

AutoCrawler

Google, Naver multiprocess image web crawler (Selenium)

405   1410   1410  

SwiftLinkPreview

It makes a preview from an URL, grabbing all the information such as t...

200   1378   1378  

fscrawler

Elasticsearch File System Crawler (FS Crawler)

299   1375   1375  

mlscraper

🤖 Scrape data from HTML websites automatically by just providing exam...

90   1349   1349  

diskover-community

Diskover Community Edition - Open source file indexer, file search eng...

152   1303   1303  

wombat

Lightweight Ruby web crawler/scraper with an elegant DSL which extract...

131   1293   1293  

OpenWPM

A web privacy measurement framework

316   1281   1281  

jd-autobuy

Python爬虫,京东自动登录,在线抢购商品

607   1270   1270  

fakebrowser

🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and ke...

213   1224   1224  

Beanbun

Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展...

252   1211   1211  

tumblr-crawler

Easily download all the photos/videos from tumblr blogs. 下载指定的 Tu...

353   1144   1144  

AppCrawler

基于appium的app自动遍历工具

458   1128   1128  

instagram-profilecrawl

📝 quickly crawl the information (e.g. followers, tags etc...) of an i...

239   1040   1040  

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dy...

113   1036   1036  

sqliv

massive SQL injection vulnerability scanner

382   1029   1029  

crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

117   1019   1019  

mzitu

👧 美女写真套图爬虫(二)

346   1018   1018  

lxSpider

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、...

331   993   993