Most popular crawler repositories and open source projects

instagram-scraper

scrapes medias, likes, followers, tags and all metadata. Inspired by i...

398   2495   2495  

Python3-Spider

Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团...

972   2491   2491  

weibo-crawler

新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频

646   2479   2479  

lianjia-beike-spider

链家网和贝壳网房价爬虫,采集北京上海广州深圳等21个中国主要城市的房价数...

645   2459   2459  

work_crawler

Download comics novels 小说漫画下载工具 小説漫画のダウンローダ 小說漫...

289   2451   2451  

grab

Web Scraping Framework

275   2405   2405  

crawler

An easy to use, powerful crawler implemented in PHP. Can execute Java...

342   2362   2362  

abot

Cross Platform C# web crawler framework built for speed and flexibilit...

561   2287   2287  

gain

Web crawling framework based on asyncio.

212   2022   2022  

skycaiji

蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运...

596   2016   2016  

gocrawl

Polite, slim and concurrent web crawler.

196   2015   2015  

DXY-COVID-19-Crawler

2019新型冠状病毒疫情实时爬虫及API | COVID-19/2019-nCoV Realtime Infect...

403   2012   2012  

google-play-scraper

Node.js scraper to get data from Google Play

583   2009   2009  

rendora

Dynamic server-side rendering using headless Chrome

106   1994   1994  

vulnx

vulnx 🕷️ an intelligent Bot, Shell can achieve automatic injection, an...

343   1920   1920  

feapder

🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder...

350   1899   1899  

spider

Web crawler and scraper for Rust

153   1898   1898  

go_spider

[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework...

470   1827   1827  

FinalRecon

The Last Web Recon Tool You'll Need

381   1815   1815  

cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, ap...

184   1793   1793  

Crawler-Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via...

234   1785   1785  

PSpider

简单易用的Python爬虫框架,QQ交流群:597510560

516   1780   1780  

xalpha

基金投资管理回测引擎

478   1756   1756  

bilix

⚡️Lightning-fast async download tool for bilibili and more

174   1741   1741  

x-crawl

Flexible Node.js AI-assisted crawler library

108   1720   1720  

ruia

Async Python 3.6+ web scraping micro-framework based on asyncio

180   1688   1688  

SCrawler

🏳️‍🌈 Media downloader from any sites, including Twitter, Reddit, Inst...

113   1684   1684  

AutoCrawler

Google, Naver multiprocess image web crawler (Selenium)

425   1670   1670  

news-please

news-please - an integrated web crawler and information extractor for...

379   1655   1655  

CatVodTVSpider

930   1587   1587  

NewPipeExtractor

NewPipe's core library for extracting data from streaming sites

487   1573   1573  

scrapoxy

Scrapoxy hides your scraper behind a cloud. It starts a pool of proxie...

222   1570   1570  

lightcrawler

Crawl a website and run it through Google lighthouse

165   1474   1474  

dirhunt

Find web directories without bruteforce

211   1462   1462  

goclone

Website Cloner - Utilizes powerful Go routines to clone websites to y...

301   1456   1456  

SwiftLinkPreview

It makes a preview from an URL, grabbing all the information such as t...

200   1385   1385  

fscrawler

Elasticsearch File System Crawler (FS Crawler)

299   1375   1375  

mlscraper

🤖 Scrape data from HTML websites automatically by just providing exam...

91   1359   1359  

diskover-community

Diskover Community Edition - Open source file indexer, file search eng...

152   1303   1303  

wombat

Lightweight Ruby web crawler/scraper with an elegant DSL which extract...

131   1293   1293  

OpenWPM

A web privacy measurement framework

316   1281   1281  

jd-autobuy

Python爬虫,京东自动登录,在线抢购商品

607   1270   1270  

fakebrowser

🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and ke...

213   1224   1224  

Beanbun

Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展...

252   1211   1211  

tumblr-crawler

Easily download all the photos/videos from tumblr blogs. 下载指定的 Tu...

353   1144   1144  

AppCrawler

基于appium的app自动遍历工具

458   1128   1128  

instagram-profilecrawl

📝 quickly crawl the information (e.g. followers, tags etc...) of an i...

239   1040   1040  

crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

120   1037   1037  

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dy...

113   1036   1036  

sqliv

massive SQL injection vulnerability scanner

382   1029   1029