Most popular crawler repositories and open source projects

crawler

An easy to use, powerful crawler implemented in PHP. Can execute Java...

342   2362   2362  

grab

Web Scraping Framework

278   2292   2292  

geziyor

Geziyor, blazing fast web crawling & scraping framework for Go. Suppor...

129   2130   2130  

abot

Cross Platform C# web crawler framework built for speed and flexibilit...

544   2107   2107  

TorBot

Dark Web OSINT Tool

435   2029   2029  

gain

Web crawling framework based on asyncio.

212   2022   2022  

gospider

Gospider - Fast web spider written in Go

273   2022   2022  

gocrawl

Polite, slim and concurrent web crawler.

196   2015   2015  

DXY-COVID-19-Crawler

2019新型冠状病毒疫情实时爬虫及API | COVID-19/2019-nCoV Realtime Infect...

403   2012   2012  

google-play-scraper

Node.js scraper to get data from Google Play

583   2009   2009  

rendora

dynamic server-side rendering using headless Chrome to effortlessly so...

108   1965   1965  

feapder

🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一...

350   1899   1899  

FinalRecon

The Last Web Recon Tool You'll Need

381   1815   1815  

go_spider

[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework...

483   1808   1808  

Crawler-Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via...

234   1785   1785  

PSpider

简单易用的Python爬虫框架,QQ交流群:597510560

516   1780   1780  

xalpha

基金投资管理回测引擎

478   1756   1756  

vulnx

vulnx 🕷️ an intelligent Bot, Shell can achieve automatic injection, an...

334   1705   1705  

skycaiji

蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运...

556   1694   1694  

ruia

Async Python 3.6+ web scraping micro-framework based on asyncio

180   1688   1688  

news-please

news-please - an integrated web crawler and information extractor for...

379   1655   1655  

CatVodTVSpider

930   1587   1587  

scrapoxy

Scrapoxy hides your scraper behind a cloud. It starts a pool of proxie...

222   1570   1570  

lightcrawler

Crawl a website and run it through Google lighthouse

165   1474   1474  

dirhunt

Find web directories without bruteforce

211   1462   1462  

AutoCrawler

Google, Naver multiprocess image web crawler (Selenium)

405   1410   1410  

SwiftLinkPreview

It makes a preview from an URL, grabbing all the information such as t...

191   1337   1337  

diskover-community

Diskover Community Edition - Open source file indexer, file search eng...

152   1303   1303  

wombat

Lightweight Ruby web crawler/scraper with an elegant DSL which extract...

131   1293   1293  

OpenWPM

A web privacy measurement framework

316   1281   1281  

jd-autobuy

Python爬虫,京东自动登录,在线抢购商品

607   1270   1270  

Beanbun

Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展...

252   1211   1211  

fscrawler

Elasticsearch File System Crawler (FS Crawler)

275   1210   1210  

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scr...

130   1206   1206  

bilix

⚡️Lightning-fast async download tool for bilibili and more | 快如闪电...

119   1205   1205  

tumblr-crawler

Easily download all the photos/videos from tumblr blogs. 下载指定的 Tu...

353   1144   1144  

AppCrawler

基于appium的app自动遍历工具

458   1128   1128  

instagram-profilecrawl

📝 quickly crawl the information (e.g. followers, tags etc...) of an in...

239   1040   1040  

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dy...

113   1036   1036  

sqliv

massive SQL injection vulnerability scanner

382   1029   1029  

mzitu

👧 美女写真套图爬虫(二)

346   1018   1018  

mlscraper

🤖 Scrape data from HTML websites automatically by just providing examp...

67   1001   1001  

lxSpider

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、...

331   993   993  

article-extractor

To extract main article from given URL with Node.js

100   990   990  

kimuraframework

Kimurai is a modern web scraping framework written in Ruby which works...

143   971   971  

ast-hook-for-js-RE

浏览器内存漫游解决方案(探索中...)

309   967   967  

Pxer

A tool for pixiv.net. 人人可用的P站爬虫

111   959   959  

cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, ap...

111   949   949  

NewPipeExtractor

NewPipe's core library for extracting data from streaming sites

347   930   930  

BT-btt

磁力網站U3C3介紹以及域名更新

84   929   929