Most popular crawler repositories and open source projects

weibo-crawler

新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频

646   2479   2479  

lianjia-beike-spider

链家网和贝壳网房价爬虫,采集北京上海广州深圳等21个中国主要城市的房价数...

645   2459   2459  

work_crawler

Download comics novels 小说漫画下载工具 小説漫画のダウンローダ 小說漫...

289   2451   2451  

owllook

owllook-小说搜索引擎

740   2426   2426  

crawler

An easy to use, powerful crawler implemented in PHP. Can execute Java...

342   2362   2362  

grab

Web Scraping Framework

278   2292   2292  

abot

Cross Platform C# web crawler framework built for speed and flexibilit...

544   2107   2107  

gain

Web crawling framework based on asyncio.

212   2022   2022  

gocrawl

Polite, slim and concurrent web crawler.

196   2015   2015  

DXY-COVID-19-Crawler

2019新型冠状病毒疫情实时爬虫及API | COVID-19/2019-nCoV Realtime Infect...

403   2012   2012  

google-play-scraper

Node.js scraper to get data from Google Play

583   2009   2009  

rendora

Dynamic server-side rendering using headless Chrome

108   2000   2000  

vulnx

vulnx 🕷️ an intelligent Bot, Shell can achieve automatic injection, an...

343   1920   1920  

feapder

🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder...

350   1899   1899  

go_spider

[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework...

470   1827   1827  

FinalRecon

The Last Web Recon Tool You'll Need

381   1815   1815  

Crawler-Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via...

234   1785   1785  

PSpider

简单易用的Python爬虫框架,QQ交流群:597510560

516   1780   1780  

xalpha

基金投资管理回测引擎

478   1756   1756  

skycaiji

蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运...

556   1694   1694  

ruia

Async Python 3.6+ web scraping micro-framework based on asyncio

180   1688   1688  

news-please

news-please - an integrated web crawler and information extractor for...

379   1655   1655  

cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, ap...

167   1621   1621  

CatVodTVSpider

930   1587   1587  

scrapoxy

Scrapoxy hides your scraper behind a cloud. It starts a pool of proxie...

222   1570   1570  

lightcrawler

Crawl a website and run it through Google lighthouse

165   1474   1474  

dirhunt

Find web directories without bruteforce

211   1462   1462  

goclone

Website Cloner - Utilizes powerful Go routines to clone websites to y...

301   1456   1456  

AutoCrawler

Google, Naver multiprocess image web crawler (Selenium)

405   1410   1410  

SwiftLinkPreview

It makes a preview from an URL, grabbing all the information such as t...

200   1378   1378  

fscrawler

Elasticsearch File System Crawler (FS Crawler)

299   1375   1375  

diskover-community

Diskover Community Edition - Open source file indexer, file search eng...

152   1303   1303  

wombat

Lightweight Ruby web crawler/scraper with an elegant DSL which extract...

131   1293   1293  

OpenWPM

A web privacy measurement framework

316   1281   1281  

jd-autobuy

Python爬虫,京东自动登录,在线抢购商品

607   1270   1270  

fakebrowser

🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and ke...

213   1224   1224  

Beanbun

Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展...

252   1211   1211  

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scr...

130   1206   1206  

bilix

⚡️Lightning-fast async download tool for bilibili and more | 快如闪电...

119   1205   1205  

tumblr-crawler

Easily download all the photos/videos from tumblr blogs. 下载指定的 Tu...

353   1144   1144  

AppCrawler

基于appium的app自动遍历工具

458   1128   1128  

instagram-profilecrawl

📝 quickly crawl the information (e.g. followers, tags etc...) of an i...

239   1040   1040  

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dy...

113   1036   1036  

sqliv

massive SQL injection vulnerability scanner

382   1029   1029  

mzitu

👧 美女写真套图爬虫(二)

346   1018   1018  

crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

115   1009   1009  

mlscraper

🤖 Scrape data from HTML websites automatically by just providing exam...

67   1001   1001  

lxSpider

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、...

331   993   993  

article-extractor

To extract main article from given URL with Node.js

100   990   990  

kimuraframework

Kimurai is a modern web scraping framework written in Ruby which works...

143   971   971