Topic

crawler

Repositories (1232)

scrapy
scrapy scrapy Python

Scrapy, a fast high-level web crawling & scraping framework for Python.

57.7k
EasySpider
EasySpider NaiboWang JavaScript

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:Service...

41.8k
lux
lux iawia002 Go

👾 Fast and simple video download library and CLI tool written in Go

30.3k
colly
colly gocolly Go

Elegant Scraper and Crawler Framework for Golang

24.5k
proxy_pool
proxy_pool jhao104 Python

Python ProxyPool for web spider

22.8k
crawlee
crawlee apify TypeScript

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs,...

18.5k
pyspider
pyspider binux Python

A Powerful Spider(Web Crawler) System in Python.

15.9k
newspaper
newspaper codelucas HTML

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

14.7k
examples-of-web-crawlers
examples-of-web-crawlers shengqiangzhang HTML

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are...

14.3k
Douyin_TikTok_Download_API
Douyin_TikTok_Download_API Evil0ctal Python

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

14.1k
katana
katana projectdiscovery Go

A next-generation crawling and spidering framework.

13.8k
crawlab
crawlab crawlab-team Go

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

11.8k
webmagic
webmagic code4craft Java

A scalable web crawler framework for Java.

11.5k
Photon
Photon s0md3v Python

Incredibly fast crawler designed for OSINT.

9.8k
avbook
avbook guyueyingmu PHP

AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Jap...

9.7k
Python
Python injetlee Python

Python脚本。模拟登录知乎, 爬虫,操作excel,微信公众号,远程开机

8.5k
spider-flow
spider-flow ssssssss-team Java

新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。

8.1k
awesome-web-scraping
awesome-web-scraping lorien Makefile

List of libraries, tools and APIs for web scraping and data processing.

7.1k
node-crawler
node-crawler bda-research TypeScript

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

6.8k
autoscraper
autoscraper alirezamika Python

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

6.7k
awesome-crawler
awesome-crawler BruceDone

A collection of awesome web crawler,spider in different languages

6.6k
pholcus
pholcus henrylee2cn Go

[Crawler for Golang] Pholcus is a distributed, high concurrency and powerful web crawler software.

6.2k
ferret
ferret MontFerret Go

Declarative web scraping

5.8k
scrapy-redis
scrapy-redis rmax Python

Redis-based components for Scrapy.

5.6k
headless-chrome-crawler
headless-chrome-crawler yujiosaka JavaScript

Distributed crawler powered by Headless Chrome

5.6k
WechatSogou
WechatSogou chyroc Python

基于搜狗微信搜索的微信公众号爬虫接口

5.5k
haipproxy
haipproxy SpiderClub Python

:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

5.5k
browser-fingerprinting
browser-fingerprinting niespodd JavaScript

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵...

4.2k
DotnetSpider
DotnetSpider dotnetcore C#

DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework

4.1k
trafilatura
trafilatura adbar Python

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

4.1k
myGPTReader
myGPTReader madawei2699 Python

A community-driven way to read and chat with AI bots - powered by chatGPT.

4.1k
ECommerceCrawlers
ECommerceCrawlers DropsDevopsOrg Python

实战🐍多种网站、电商数据爬虫🕷。包含🕸:淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景...

4k
arachni
arachni Arachni Ruby

Web Application Security Scanner Framework

3.9k
dom-crawler
dom-crawler symfony PHP

Eases DOM navigation for HTML and XML documents

3.8k
scylla
scylla imWildCat Python

Intelligent proxy pool for Humans™

3.7k
toapi
toapi elliotgao2 Python

Every web site provides APIs.

3.5k
TorBot
TorBot DedSecInside Python

Dark Web OSINT Tool

3.5k
proxypool
proxypool zu1k Go

自动抓取tg频道、订阅地址、公开互联网上的ss、ssr、vmess、trojan节点信息,聚合去重测试可用性后提供节点列表

3.5k
ProxyBroker
ProxyBroker constverum Python

Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:

3.4k
NGCBot
NGCBot ngc660sec

一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡漏洞查询,⚡手机号归属地查询,⚡知识库查...

3.4k
Crawler_Illegal_Cases_In_China
Crawler_Illegal_Cases_In_China HiddenStrawberry HTML

Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作...

3.1k
crawlergo
crawlergo Qianlitp Go

A powerful browser crawler for web vulnerability scanners

2.9k
DecryptLogin
DecryptLogin CharlesPikachu Python

DecryptLogin: APIs for loginning some websites by using requests.

2.9k
owllook
owllook howie6879 Python

owllook-小说搜索引擎

2.8k
cariddi
cariddi edoardottt Go

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

2.7k
geziyor
geziyor geziyor Go

Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

2.7k
GoogleScraper
GoogleScraper NikolaiT HTML

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.

2.7k
gospider
gospider jaeles-project Go

Gospider - Fast web spider written in Go

2.7k
QueryList
QueryList jae-jae PHP

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

2.6k
google-play-scraper
google-play-scraper facundoolano JavaScript

Node.js scraper to get data from Google Play

2.6k