Most popular crawler repositories and open source projects

auto-lighthouse TGiles HTML

A utility package for automating lighthouse reporting

142 19 142

npm-search algolia TypeScript

🗿 npm ↔️ Algolia replication tool :skier: :snail: :artificial_satellite:

142 23 142

SpiderBOX WuKongSecurity CSS

SpiderBox - 虫盒 - 爬虫逆向资源导航站

142 26 142

PHPCreeper blogdaren PHP

A new generation of multi-process async event-driven spider engine based on workerman. Support headless browser. 🌿基于workerman实现的多进程异步事件...

141 15 141

Ceiba-Downloader jameshwc Python

This is a course-downloader to help NTU students download courses data from NTU Ceiba.

140 9 140

WeiboSpider CharesFang Python

微博爬虫，一个基于Scrapy框架的轻量微博爬虫，Sina Weibo Spider

139 28 139

agent-line-bot Lin-jun-xiang Python

🤖Free Agent Line Bot with Web Search, Google Image Search, Image Generator, Video Generator...

139 147 139

poopak teal33t Python

POOPAK - TOR Hidden Service Crawler

139 34 139

pinscrape iamatulsingh Python

A simple library to scrape Pinterest images.

138 26 138

docs zhangslob

《数据采集从入门到放弃》源码。内容简介：爬虫介绍、就业情况、爬虫工程师面试题；HTTP协议介绍； Requests使用；解析器Xpath介绍； MongoDB与MySQL；多线程...

137 30 137

moe-copy-ai yusixian TypeScript

✨ 萌萌哒的 AI 网页数据提取助手 ✨

136 12 136

pricetrack duyet JavaScript

Price tracker monitors of products and alerts you when prices drop. Supported tiki.vn, shopee, lotte.vn, ... Built with firebase https://pricetrack.we...

136 48 136

php-crawler hedii PHP

A php crawler that finds emails on the internets

136 62 136

blinkist-m4a-downloader luckylittle Go

Grabs all of the audio files from all of the Blinkist books

136 28 136

picacomic_downloader muyoou Python

哔咔漫画收藏夹下载程序

135 14 135

PatentCrawler will4906 Python

scrapy专利爬虫（停止维护）

135 71 135

WebReaper pavlovtech C#

Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.

135 33 135

wget-lua ArchiveTeam C

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

135 18 135

leetcode-ranking-search chiehmin Vue

Leetcode Contest Ranking Searcher

134 21 134

GoodBots AnTheMaker

Updated lists of IP addresses/whitelists of good bots and crawlers. Includes GoogleBot, BingBot, DuckDuckBot, etc.

133 26 133

convertible-bond-crawler jackluson HTML

宁稳网(旧富投网)、集思录可转债数据&策略分析

133 51 133

proxy-pool XiaomingX Python

Python ProxyPool for web spider.ProxyPool 是一个用于采集、验证和管理代理IP的轻量级工具，旨在帮助用户自动维护高质量的代理池，方便在爬虫、网络请求中灵活...

133 29 133

sitemapper seantomburke TypeScript

Parse through any sitemap in Node.js

133 81 133

feaplat Boris-code

爬虫管理系统，支持集群，弹性伸缩。支持运行feapder、scrapy、selenium、playwright等各种框架及脚本

132 30 132

Terpene-Profile-Parser-for-Cannabis-Strains MaxValue Python

Parser and database to index the terpene profile of different strains of Cannabis from online databases

132 18 132

pdf-crawler SimFin Python

SimFin's open source PDF crawler

130 46 130

news-crawler LuChang-CS Python

A news crawler for BBC News, Reuters and New York Times.

130 37 130

Sina-Weibo-Album-Downloader lincanbin Python

Multithreading download all HD photos / pictures from someone's Sina Weibo album.

129 43 129

scraply alash3al Go

Scraply a simple dom scraper to fetch information from any html based website

129 13 129

memex-explorer nasa-jpl-memex Python

Viewers for statistics and dashboarding of Domain Search Engine data

128 62 128

damai-tickets Jxpro Python

大麦抢票脚本案例

128 12 128

onegram pauloromeira Python

This repository is no longer maintained.

128 5 128

web-scout-mcp pinkpixel-dev JavaScript

A powerful MCP server extension providing web search and content extraction capabilities. Integrates DuckDuckGo search functionality and URL content e...

128 13 128

node-crawler ethereum Go

Attempts to crawl the Ethereum network of valid Ethereum execution nodes and visualizes them in a nice web dashboard.

128 64 128

lumberjack JakePartusch JavaScript

An automated website accessibility scanner and cli

127 7 127

dyer hominee Rust

Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.

127 7 127

GoogleImagesDownloader WuLC Python

Enlarge training dataset by searching images with specified keywords in google and download the presented images

126 65 126

Skill-Share-Crawler---DL tharyckgusmao JavaScript

Download Videos Skill Share per ID or per Class

126 33 126

instagram-profilecrawl nacimgoura JavaScript

:computer: Quickly crawl the information (e.g. followers, tags, etc...) of an instagram profile. No login required!

125 30 125

prerender-java greengerong Java

java framework for prerender

125 48 125

TiebaArchiver Sorceresssis Python

保存百度贴吧帖子到本地，并且支持图片, 视频, 语音等内容。与本项目配套的阅读器 TiebaReader(https://github.com/Sorceresssis/TiebaReader)

124 11 124

graphquery storyicon Go

GraphQuery is a query language and execution engine tied to any backend service.

124 17 124

spiderbuf hhuayuan Python

Spiderbuf 是一个专注于 Python 爬虫练习的网站。提供丰富的爬虫教程、爬虫案例解析和爬虫练习题。Python爬虫开发强化练习，在矛与盾的攻防中不断提高技术水平，...

124 12 124

sentinel-crawler wx-chevalier JavaScript

Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with...

124 26 124