Topic

crawler

Repositories (1431)

WebCrawler
WebCrawler Misterhex C#

Just a simple web crawler which return crawled links as IObservable using reactive extension and async await.

60
WebSpider
WebSpider xdoer JavaScript

基于Nodejs,superagent,cheerio的在线web爬虫项目,支持生成API

60
findopendata
findopendata findopendata Python

A search engine for Open Data

60
custom-crawler
custom-crawler rollrat C#

🌌 High productivity semi-automatic crawler generator 🛠️🧰

60
scrapy-distributed
scrapy-distributed Insutanto Python

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy...

60
crawler-userscript
crawler-userscript zjh1943 JavaScript

一个基于 Tampermonkey 插件平台开发的爬虫。主要目的是最大限度模拟用户环境,避免被反爬虫系统识破。

60
google_news_scraper_and_sentiment_analyzer
google_news_scraper_and_sentiment_analyzer pratikpv Python

Downloads news articles from Google news and uses pre-trained NLP models to perform sentiment analysis

60
AfdianToMarkdown
AfdianToMarkdown PhiFever Go

爱发电爬虫(afdian.com)

60
pomp
pomp estin Python

Screen scraping and web crawling framework

59
Daily-code
Daily-code rui7157 Python

日常代码爬虫、gui小工具等

59
lyrics-crawler
lyrics-crawler willamesoares Python

Get the lyrics for the song currently playing on Spotify

59
snapcrawl
snapcrawl DannyBen Ruby

Crawl a website and take screenshots

59
unfx-proxy-parser
unfx-proxy-parser openproxyspace JavaScript

Unfx Proxy Parser - Nextgen proxy parser with deep links crawler. Follow to internal links, third-party links. Sorting results by countries.

59
Web-Iota
Web-Iota SatinWukerORIG Python

Iota is a web scraper that can find all of the images and links/suburls on a webpage

59
DDoM
DDoM Endermanch Python

A simple, open-source, easy to use, and free download manager for malware samples.

59
spider-nodejs
spider-nodejs spider-rs Rust

Spider ported to Node.js

59
phpcrawl
phpcrawl mmerian PHP

Copy of http://phpcrawl.cuab.de/ for using with composer

58
tool-gin
tool-gin bajins Go

基于go-gin框架建立减少冗余动作项目,如:下载一些工具

58
proxycrawl-python
proxycrawl-python crawlbase Python

ProxyCrawl Python library for scraping and crawling

58
wishlist
wishlist Jaymon Python

Read an Amazon wishlist programmatically with Python

58
rolling-news
rolling-news Jacen789 Python

获取滚动新闻

58
gscholar-citations-crawler
gscholar-citations-crawler thu-pacman Python

Crawl all your citations from Google Scholar

58
crawler
crawler tomasnorre PHP

Libraries and scripts for crawling the TYPO3 page tree. Used for re-caching, re-indexing, publishing applications etc.

58
wechat_biz
wechat_biz yusp998 Go

微信公众号爬虫,以API方式提供公众号文章获取,包括阅读量、点赞等

58
app-crawler
app-crawler timschneeb Python

Python script that searches GitHub, F-Droid and IzzySoft's F-Droid repo for apps with Shizuku support. Updated daily.

58
crw
crw us Rust

Fast, lightweight Firecrawl alternative in Rust. Web scraper, crawler & search API with MCP server for AI agents. Drop-in Firecrawl-compatible API (/v...

58
talospider
talospider howie6879 Python

talospider - A simple,lightweight scraping micro-framework

57
All-IT-eBooks-Spider
All-IT-eBooks-Spider Kulbear Python

[Updated] A simple python crawler for my tutorial blog at http://www.jianshu.com/p/8fb5bc33c78e

57
SearchEngineScrapy
SearchEngineScrapy naqushab Python

Scrape data from Google.com, Bing.com, Baidu.com, Ask.com, Yahoo.com, Yandex.com

57
billboard-json
billboard-json KoreanThinker TypeScript

🎧 Get json type billboard hot 100 chart

57
python
python joaopauloaramuni HTML

Repo Python

57
facebook-page-info-scraper
facebook-page-info-scraper wael-sudo2 Python

Free Facebook pages MetaData Scraping Library - Unlimited Calls

57
Tapestry
Tapestry NatsuFox Python

Tapestry - 基于 Agent Skill Bundle 的轻量级书签知识库

57
Visual_MediaCrawler
Visual_MediaCrawler persist-1 Python

可视化爬虫(支持:哔哩哔哩 | 抖音 | 小红书 | 贴吧 | 微博 | 知乎 | 快手),异步、高效、直观地采集国内主流平台的媒体数据的前后端一体项目(Based on "Medi...

57
instagram-hashtag-crawler
instagram-hashtag-crawler simonseo Python

Crawl Instagram hashtags

56
price-monitoring
price-monitoring roccomuso JavaScript

Node.js price monitoring library, leveraging the power of x-ray and nightmare.

56
actor-facebook-scraper
actor-facebook-scraper pocesar TypeScript

Scrape public Facebook pages, posts, reviews and comments

56
crawler
crawler a11ywatch Rust

gRPC web crawler turbo charged for performance

56
ClaudeChrome
ClaudeChrome NatsuFox JavaScript

ClaudeChrome - Native browser context awareness for agents.

56
PicCrawler
PicCrawler fengzhizi715 Java

使用RxJava2 和 Java 8的特性开发的图片爬虫

55
MyCrawler
MyCrawler netcan Python

我的爬虫合集

55
devsearch
devsearch nicholaskajoh Python

A web search engine built with Python which uses TF-IDF and PageRank to sort search results.

55
simple_bank_korea
simple_bank_korea Beomi Python

simple crawler for Korean banks with Transactions

55
NewsCrap
NewsCrap odaysec Python

NewsCrap adalah alat scraping berita Google berbasis Command Line Interface (CLI) yang dirancang untuk riset, investigasi, dan pengumpulan data OSINT....

55
kalel
kalel noobscode Python

Kal El Network Stress Test and Penetration Testing Toolkit

54
crawler_shopee
crawler_shopee charlie0227 Python

Shopee coin getter is a script to collect daily shopee coins.

54
chan-downloader
chan-downloader mariot Rust

CLI to download all images/webms in a 4chan thread

54
rarbgcli
rarbgcli FarisHijazi Python

RARBG command line interface for scraping the rarbg.to torrent search engine

54
kabegame
kabegame kabegame Rust

Kabegame — An anime image crawler client with pluggable crawlers (from a GitHub plugin repo), wallpaper rotation by custom rules, and Wallpaper Engine...

54
rag-crawler
rag-crawler sigoden TypeScript

Crawl a website to generate knowledge file for RAG

54