Topic

crawler

Repositories (1427)

firecrawl
firecrawl firecrawl TypeScript

🔥 The Web Data API for AI - Power AI agents with clean web data

108.8k
scrapy
scrapy scrapy Python

Scrapy, a fast high-level web crawling & scraping framework for Python.

61.3k
EasySpider
EasySpider NaiboWang JavaScript

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:Service...

43.8k
Scrapling
Scrapling D4Vinci Python

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

36.2k
lux
lux iawia002 Go

👾 Fast and simple video download library and CLI tool written in Go

31k
colly
colly gocolly Go

Elegant Scraper and Crawler Framework for Golang

25.2k
Scrapegraph-ai
Scrapegraph-ai ScrapeGraphAI Python

Python scraper based on AI

23.3k
proxy_pool
proxy_pool jhao104 Python

Python ProxyPool for web spider

23.3k
crawlee
crawlee apify TypeScript

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs,...

22.7k
Douyin_TikTok_Download_API
Douyin_TikTok_Download_API Evil0ctal Python

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

17.1k
pyspider
pyspider binux Python

A Powerful Spider(Web Crawler) System in Python.

16.9k
katana
katana projectdiscovery Go

A next-generation crawling and spidering framework.

16.5k
maxun
maxun getmaxun TypeScript

🔥 The open-source no-code platform for web scraping, crawling, search and AI data extraction • Turn websites into structured APIs in minutes 🔥

15.3k
newspaper
newspaper codelucas Python

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

15k
examples-of-web-crawlers
examples-of-web-crawlers shengqiangzhang HTML

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are...

14.6k
Photon
Photon s0md3v Python

Incredibly fast crawler designed for OSINT.

12.8k
crawlab
crawlab crawlab-team Go

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

12.2k
webmagic
webmagic code4craft Java

A scalable web crawler framework for Java.

11.7k
spider-flow
spider-flow ssssssss-team Java

新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。

11.3k
Python
Python injetlee Python

Python脚本。模拟登录知乎, 爬虫,操作excel,微信公众号,远程开机

10.6k
avbook
avbook guyueyingmu PHP

AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Jap...

9.9k
crawlee-python
crawlee-python apify Python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, P...

8.8k
wiseflow
wiseflow TeamWiseFlow TypeScript

为你 7*24 在线搞钱的“云上牛马”团队

8.2k
awesome-web-scraping
awesome-web-scraping lorien Makefile

List of libraries, tools and APIs for web scraping and data processing.

7.8k
awesome-crawler
awesome-crawler BruceDone

A collection of awesome web crawler,spider in different languages

7.2k
autoscraper
autoscraper alirezamika Python

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

7.1k
node-crawler
node-crawler bda-research TypeScript

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

6.8k
pydoll
pydoll autoscrape-labs Python

Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.

6.7k
WechatSogou
WechatSogou chyroc Python

基于搜狗微信搜索的微信公众号爬虫接口

6.2k
pholcus
pholcus henrylee2cn Go

[Crawler for Golang] Pholcus is a distributed, high concurrency and powerful web crawler software.

6.2k
ferret
ferret MontFerret Go

Declarative web scraping

6k
trafilatura
trafilatura adbar Python

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

5.7k
headless-chrome-crawler
headless-chrome-crawler yujiosaka JavaScript

Distributed crawler powered by Headless Chrome

5.7k
scrapy-redis
scrapy-redis rmax Python

Redis-based components for Scrapy.

5.6k
JMComic-Crawler-Python
JMComic-Crawler-Python hect0x7 Python

Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀

5.6k
haipproxy
haipproxy SpiderClub Python

:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

5.6k
ECommerceCrawlers
ECommerceCrawlers DropsDevopsOrg Python

实战🐍多种网站、电商数据爬虫🕷。包含🕸:淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景...

5.5k
browser-fingerprinting
browser-fingerprinting niespodd JavaScript

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵...

5k
Crawler_Illegal_Cases_In_China
Crawler_Illegal_Cases_In_China hiddendevj HTML

Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作...

4.6k
weibo-crawler
weibo-crawler dataabc Python

新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频

4.4k
myGPTReader
myGPTReader myreader-io Python

A community-driven way to read and chat with AI bots - powered by chatGPT.

4.4k
DotnetSpider
DotnetSpider dotnetcore C#

DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework

4.1k
ProxyBroker
ProxyBroker constverum Python

Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:

4.1k
dom-crawler
dom-crawler symfony PHP

Eases DOM navigation for HTML and XML documents

4k
scylla
scylla MikeChongCan Python

Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era

4k
arachni
arachni Arachni Ruby

Web Application Security Scanner Framework

4k
proxypool
proxypool zu1k Go

Automatically crawls proxy nodes on the public internet, de-duplicates and tests for usability and then provides a list of nodes

4k
work_crawler
work_crawler kanasimi JavaScript

Download comics novels 小说漫画下载工具 小説漫画のダウンローダ 小說漫畫下載:腾讯漫画 大角虫漫画 有妖气 咪咕 SF漫画 哦漫画 看漫画 漫画柜 汗汗酷漫 動漫...

4k
TorBot
TorBot DedSecInside Python

Dark Web OSINT Tool

4k
puppeteer-sharp
puppeteer-sharp hardkoded C#

Headless Chrome .NET API

3.9k