Most popular crawler repositories and open source projects

firecrawl firecrawl TypeScript

The API to search, scrape, and interact with the web at scale. 🔥

152.9k 8.7k 403

Scrapling D4Vinci Python

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

70.6k 7k 259

scrapy scrapy Python

Scrapy, a fast high-level web crawling & scraping framework for Python.

63.3k 11.8k 1.8k

EasySpider NaiboWang JavaScript

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/网页爬虫软件，可以无代码图形化的设计和执行爬虫任务。别名：Ser...

44.3k 5.4k 234

lux iawia002 Go

👾 Fast and simple video download library and CLI tool written in Go

31.5k 3.3k 379

Scrapegraph-ai ScrapeGraphAI Python

Python scraper based on AI

28.5k 2.8k 160

colly gocolly Go

Elegant Scraper and Crawler Framework for Golang

25.4k 1.9k 315

crawlee apify TypeScript

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs,...

24.9k 1.6k 131

proxy_pool jhao104 Python

Python ProxyPool for web spider

23.5k 5.4k 436

Douyin_TikTok_Download_API Evil0ctal Python

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具，支持API调用，在线批量解析及下载。

18.9k 2.7k 105

pyspider binux Python

A Powerful Spider(Web Crawler) System in Python.

16.8k 3.6k 1

katana projectdiscovery Go

A next-generation crawling and spidering framework.

16.6k 1.1k 16.6k

maxun getmaxun TypeScript

🔥 The open-source no-code platform for web scraping, crawling, search and AI data extraction • Turn websites into structured APIs in minutes 🔥

16.6k 1.4k 83

newspaper codelucas Python

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

15.1k 2.1k 373

examples-of-web-crawlers shengqiangzhang HTML

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are...

14.7k 3.8k 340

Photon s0md3v Python

Incredibly fast crawler designed for OSINT.

13.1k 1.7k 333

crawlab crawlab-team Go

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

12.2k 1.9k 12.2k

webmagic code4craft Java

A scalable web crawler framework for Java.

11.7k 4.1k 11.7k

spider-flow ssssssss-team Java

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

11.3k 2.2k 11.3k

Python injetlee Python

Python脚本。模拟登录知乎，爬虫，操作excel，微信公众号，远程开机

10.7k 4.3k 724

avbook guyueyingmu PHP

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Jap...

10k 2k 10k

crawlee-python apify Python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, P...

9.4k 778 46

wiseflow TeamWiseFlow TypeScript

为你 7*24 在线搞钱的“云上牛马”团队

8.2k 1.4k 8.2k

awesome-web-scraping lorien Makefile

List of libraries, tools and APIs for web scraping and data processing.

8k 918 230

autoscraper alirezamika Python

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

7.7k 788 118

awesome-crawler BruceDone

A collection of awesome web crawler,spider in different languages

7.2k 749 7.2k

pydoll autoscrape-labs Python

Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.

7k 390 34

node-crawler bda-research TypeScript

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

6.8k 866 246

JMComic-Crawler-Python hect0x7 Python

Python API for JMComic | 提供Python API访问禁漫天堂，同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀

6.7k 11.4k 14

WechatSogou chyroc Python

基于搜狗微信搜索的微信公众号爬虫接口

6.3k 1.7k 275

pholcus henrylee2cn Go

[Crawler for Golang] Pholcus is a distributed, high concurrency and powerful web crawler software.

6.2k 1.5k 6.2k

ferret MontFerret Go

Declarative data automation language and Go runtime for structured extraction workflows.

6k 326 92

Bili23-Downloader ScottSloan Python

开源、免费、跨平台的 B 站视频下载工具，支持多线程加速、音视频分离、弹幕元数据获取、自定义命名与分类等功能。Open Source, Free, Cross-Platform Bilibili...

5.9k 412 32

trafilatura adbar Python

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

5.8k 359 5.8k

scrapy-redis rmax Python

Redis-based components for Scrapy.

5.6k 1.6k 263

headless-chrome-crawler yujiosaka JavaScript

Distributed crawler powered by Headless Chrome

5.6k 404 112

ECommerceCrawlers DropsDevopsOrg Python

实战🐍多种网站、电商数据爬虫🕷。包含🕸：淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景...

5.6k 1.4k 149

haipproxy SpiderClub Python

:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

5.5k 899 203

browser-fingerprinting niespodd JavaScript

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵...

5k 273 5k