Topic

crawler

Repositories (1431)

auto-lighthouse
auto-lighthouse TGiles HTML

A utility package for automating lighthouse reporting

142
npm-search
npm-search algolia TypeScript

🗿 npm ↔️ Algolia replication tool :skier: :snail: :artificial_satellite:

142
SpiderBOX
SpiderBOX WuKongSecurity CSS

SpiderBox - 虫盒 - 爬虫逆向资源导航站

142
PHPCreeper
PHPCreeper blogdaren PHP

A new generation of multi-process async event-driven spider engine based on workerman. Support headless browser. 🌿基于workerman实现的多进程异步事件...

141
Ceiba-Downloader
Ceiba-Downloader jameshwc Python

This is a course-downloader to help NTU students download courses data from NTU Ceiba.

140
WeiboSpider
WeiboSpider CharesFang Python

微博爬虫,一个基于Scrapy框架的轻量微博爬虫,Sina Weibo Spider

139
agent-line-bot
agent-line-bot Lin-jun-xiang Python

🤖Free Agent Line Bot with Web Search, Google Image Search, Image Generator, Video Generator...

139
poopak
poopak teal33t Python

POOPAK - TOR Hidden Service Crawler

139
pinscrape
pinscrape iamatulsingh Python

A simple library to scrape Pinterest images.

138
docs
docs zhangslob

《数据采集从入门到放弃》源码。内容简介:爬虫介绍、就业情况、爬虫工程师面试题 ;HTTP协议介绍; Requests使用 ;解析器Xpath介绍; MongoDB与MySQL; 多线程...

137
moe-copy-ai
moe-copy-ai yusixian TypeScript

✨ 萌萌哒的 AI 网页数据提取助手 ✨

136
pricetrack
pricetrack duyet JavaScript

Price tracker monitors of products and alerts you when prices drop. Supported tiki.vn, shopee, lotte.vn, ... Built with firebase https://pricetrack.we...

136
php-crawler
php-crawler hedii PHP

A php crawler that finds emails on the internets

136
blinkist-m4a-downloader
blinkist-m4a-downloader luckylittle Go

Grabs all of the audio files from all of the Blinkist books

136
picacomic_downloader
picacomic_downloader muyoou Python

哔咔漫画收藏夹下载程序

135
PatentCrawler
PatentCrawler will4906 Python

scrapy专利爬虫(停止维护)

135
WebReaper
WebReaper pavlovtech C#

Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.

135
wget-lua
wget-lua ArchiveTeam C

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

135
leetcode-ranking-search
leetcode-ranking-search chiehmin Vue

Leetcode Contest Ranking Searcher

134
GoodBots
GoodBots AnTheMaker

Updated lists of IP addresses/whitelists of good bots and crawlers. Includes GoogleBot, BingBot, DuckDuckBot, etc.

133
convertible-bond-crawler
convertible-bond-crawler jackluson HTML

宁稳网(旧富投网)、集思录可转债数据&策略分析

133
proxy-pool
proxy-pool XiaomingX Python

Python ProxyPool for web spider.ProxyPool 是一个用于采集、验证和管理代理IP的轻量级工具,旨在帮助用户自动维护高质量的代理池,方便在爬虫、网络请求中灵活...

133
sitemapper
sitemapper seantomburke TypeScript

Parse through any sitemap in Node.js

133
feaplat
feaplat Boris-code

爬虫管理系统,支持集群,弹性伸缩。支持运行feapder、scrapy、selenium、playwright等各种框架及脚本

132
Terpene-Profile-Parser-for-Cannabis-Strains
Terpene-Profile-Parser-for-Cannabis-Strains MaxValue Python

Parser and database to index the terpene profile of different strains of Cannabis from online databases

132
pdf-crawler
pdf-crawler SimFin Python

SimFin's open source PDF crawler

130
news-crawler
news-crawler LuChang-CS Python

A news crawler for BBC News, Reuters and New York Times.

130
Sina-Weibo-Album-Downloader
Sina-Weibo-Album-Downloader lincanbin Python

Multithreading download all HD photos / pictures from someone's Sina Weibo album.

129
scraply
scraply alash3al Go

Scraply a simple dom scraper to fetch information from any html based website

129
memex-explorer
memex-explorer nasa-jpl-memex Python

Viewers for statistics and dashboarding of Domain Search Engine data

128
damai-tickets
damai-tickets Jxpro Python

大麦抢票脚本案例

128
onegram
onegram pauloromeira Python

This repository is no longer maintained.

128
web-scout-mcp
web-scout-mcp pinkpixel-dev JavaScript

A powerful MCP server extension providing web search and content extraction capabilities. Integrates DuckDuckGo search functionality and URL content e...

128
node-crawler
node-crawler ethereum Go

Attempts to crawl the Ethereum network of valid Ethereum execution nodes and visualizes them in a nice web dashboard.

128
lumberjack
lumberjack JakePartusch JavaScript

An automated website accessibility scanner and cli

127
dyer
dyer hominee Rust

Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.

127
GoogleImagesDownloader
GoogleImagesDownloader WuLC Python

Enlarge training dataset by searching images with specified keywords in google and download the presented images

126
Skill-Share-Crawler---DL
Skill-Share-Crawler---DL tharyckgusmao JavaScript

Download Videos Skill Share per ID or per Class

126
instagram-profilecrawl
instagram-profilecrawl nacimgoura JavaScript

:computer: Quickly crawl the information (e.g. followers, tags, etc...) of an instagram profile. No login required!

125
prerender-java
prerender-java greengerong Java

java framework for prerender

125
TiebaArchiver
TiebaArchiver Sorceresssis Python

保存百度贴吧帖子到本地,并且支持图片, 视频, 语音等内容。与本项目配套的阅读器 TiebaReader(https://github.com/Sorceresssis/TiebaReader)

124
graphquery
graphquery storyicon Go

GraphQuery is a query language and execution engine tied to any backend service.

124
spiderbuf
spiderbuf hhuayuan Python

Spiderbuf 是一个专注于 Python 爬虫练习的网站。提供丰富的爬虫教程、爬虫案例解析和爬虫练习题。Python爬虫开发强化练习,在矛与盾的攻防中不断提高技术水平,...

124
sentinel-crawler
sentinel-crawler wx-chevalier JavaScript

Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with...

124
Price-monitor
Price-monitor qqxx6661 Python

某东商品价格监控:自定义商品价格,降价邮件/微信提醒。技术:Python爬虫/IP代理池/JS接口爬取/Selenium页面爬取

123
andvaranaut
andvaranaut glouw C

A dungeon crawler

123
WeiBoCrawler
WeiBoCrawler zhouyi207 Rust

微博数据采集,后续会加上知乎,贴吧,小红书,抖音,快手等主流媒体内容

122
aiotieba
aiotieba Starry-OvO Python

百度贴吧吧务管理器✨删帖机✨使用aiohttp封装大量贴吧核心API

122
KamiYomu
KamiYomu KamiYomu C#

A self-hosted, extensible manga reader and download tool with plug-in support.

121
TiebaManager
TiebaManager xfgryujk C++

(已跑路)百度贴吧吧务管理工具,自动扫描帖子并处理违规帖

121