Most popular crawler repositories and open source projects

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python...

10050   47723   47723  

lux

👾 Fast and simple video download library and CLI tool written in Go

2521   21391   21391  

colly

Elegant Scraper and Crawler Framework for Golang

1617   19881   19881  

proxy_pool

Python爬虫代理IP池(proxy pool)

4702   18255   18255  

pyspider

A Powerful Spider(Web Crawler) System in Python.

3681   15945   15945  

EasySpider

A visual no-code/code-free web crawler/spider易采集:一个可视化爬虫软...

1689   14540   14540  

newspaper

News, full-text, and article metadata extraction in Python 3. Advanced...

2044   12913   12913  

examples-of-web-crawlers

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、...

3645   12168   12168  

webmagic

A scalable web crawler framework for Java.

4138   10876   10876  

crawlab

Distributed web crawler admin platform for spiders management regardle...

1637   9955   9955  

Photon

Incredibly fast crawler designed for OSINT.

1409   9785   9785  

avbook

AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆...

2024   8965   8965  

crawlee

Crawlee—A web scraping and browser automation library for Node.js that...

374   8610   8610  

Python

Python脚本。模拟登录知乎, 爬虫,操作excel,微信公众号,远程开机

4176   8463   8463  

spider-flow

新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。

1565   8149   8149  

katana

A next-generation crawling and spidering framework.

346   6770   6770  

node-crawler

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

891   6462   6462  

pholcus

[Crawler for Golang] Pholcus is a distributed, high concurrency and po...

1502   6197   6197  

awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing...

761   5870   5870  

awesome-crawler

A collection of awesome web crawler,spider in different languages

656   5626   5626  

WechatSogou

基于搜狗微信搜索的微信公众号爬虫接口

1630   5541   5541  

ferret

Declarative web scraping

304   5408   5408  

headless-chrome-crawler

Distributed crawler powered by Headless Chrome

433   5384   5384  

scrapy-redis

Redis-based components for Scrapy.

1581   5307   5307  

haipproxy

:sparkling_heart: High available distributed ip proxy pool, powerd by...

945   5286   5286  

autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

574   5278   5278  

myGPTReader

A community-driven way to read and chat with AI bots - powered by chat...

421   4095   4095  

ECommerceCrawlers

实战🐍多种网站、电商数据爬虫🕷。包含🕸:淘宝商品、微信公众号、大众点评、...

1219   3969   3969  

dom-crawler

Eases DOM navigation for HTML and XML documents

123   3804   3804  

scylla

Intelligent proxy pool for Humans™

465   3746   3746  

DotnetSpider

DotnetSpider, a .NET standard web crawling library. It is lightweight,...

1002   3673   3673  

browser-fingerprinting

Analysis of Bot Protection systems with available countermeasures 🚿. H...

201   3544   3544  

proxypool

自动抓取tg频道、订阅地址、公开互联网上的ss、ssr、vmess、trojan节点信息...

2558   3454   3454  

ProxyBroker

Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:

951   3408   3408  

arachni

Web Application Security Scanner Framework

732   3405   3405  

toapi

Every web site provides APIs.

247   3393   3393  

Douyin_TikTok_Download_API

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音|TikTok数...

612   3161   3161  

Crawler_Illegal_Cases_In_China

Collection of China illegal cases about web crawler 本项目用来整理所有...

250   3101   3101  

DecryptLogin

DecryptLogin: APIs for loginning some websites by using requests.

737   2669   2669  

QueryList

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框...

435   2573   2573  

RED_HAWK

All in one tool for Information Gathering, Vulnerability Scanning and...

823   2532   2532  

GoogleScraper

A Python module to scrape several search engines (like Google, Yandex,...

761   2504   2504  

crawlergo

A powerful browser crawler for web vulnerability scanners

446   2499   2499  

instagram-scraper

scrapes medias, likes, followers, tags and all metadata. Inspired by i...

398   2495   2495  

Python3-Spider

Python爬虫实战 - 模拟登陆各大网站 包含但不限于:滑块验证、拼多多、美团...

972   2491   2491  

weibo-crawler

新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频

646   2479   2479  

lianjia-beike-spider

链家网和贝壳网房价爬虫,采集北京上海广州深圳等21个中国主要城市的房价数...

645   2459   2459  

gecco

Easy to use lightweight web crawler(易用的轻量化网络爬虫)

898   2452   2452  

work_crawler

Download comics novels 小说漫画下载工具 小説漫画のダウンローダ 小說漫...

289   2451   2451  

owllook

owllook-小说搜索引擎

740   2426   2426