Most popular crawler repositories and open source projects

WebCrawler Misterhex C#

Just a simple web crawler which return crawled links as IObservable using reactive extension and async await.

60 31 60

WebSpider xdoer JavaScript

基于Nodejs,superagent,cheerio的在线web爬虫项目，支持生成API

60 18 60

findopendata findopendata Python

A search engine for Open Data

60 8 60

custom-crawler rollrat C#

🌌 High productivity semi-automatic crawler generator 🛠️🧰

60 4 60

scrapy-distributed Insutanto Python

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy...

60 11 60

crawler-userscript zjh1943 JavaScript

一个基于 Tampermonkey 插件平台开发的爬虫。主要目的是最大限度模拟用户环境，避免被反爬虫系统识破。

60 18 60

google_news_scraper_and_sentiment_analyzer pratikpv Python

Downloads news articles from Google news and uses pre-trained NLP models to perform sentiment analysis

60 15 60

AfdianToMarkdown PhiFever Go

爱发电爬虫(afdian.com)

60 9 60

pomp estin Python

Screen scraping and web crawling framework

59 10 59

Daily-code rui7157 Python

日常代码爬虫、gui小工具等

59 5 59

lyrics-crawler willamesoares Python

Get the lyrics for the song currently playing on Spotify

59 16 59

snapcrawl DannyBen Ruby

Crawl a website and take screenshots

59 13 59

unfx-proxy-parser openproxyspace JavaScript

Unfx Proxy Parser - Nextgen proxy parser with deep links crawler. Follow to internal links, third-party links. Sorting results by countries.

59 23 59

Web-Iota SatinWukerORIG Python

Iota is a web scraper that can find all of the images and links/suburls on a webpage

59 5 59

DDoM Endermanch Python

A simple, open-source, easy to use, and free download manager for malware samples.

59 5 59

spider-nodejs spider-rs Rust

Spider ported to Node.js

59 10 59

phpcrawl mmerian PHP

Copy of http://phpcrawl.cuab.de/ for using with composer

58 33 58

tool-gin bajins Go

基于go-gin框架建立减少冗余动作项目，如：下载一些工具

58 20 58

proxycrawl-python crawlbase Python

ProxyCrawl Python library for scraping and crawling

58 19 58

wishlist Jaymon Python

Read an Amazon wishlist programmatically with Python

58 11 58

rolling-news Jacen789 Python

获取滚动新闻

58 14 58

gscholar-citations-crawler thu-pacman Python

Crawl all your citations from Google Scholar

58 11 58

crawler tomasnorre PHP

Libraries and scripts for crawling the TYPO3 page tree. Used for re-caching, re-indexing, publishing applications etc.

58 88 58

wechat_biz yusp998 Go

微信公众号爬虫，以API方式提供公众号文章获取，包括阅读量、点赞等

58 1 58

app-crawler timschneeb Python

Python script that searches GitHub, F-Droid and IzzySoft's F-Droid repo for apps with Shizuku support. Updated daily.

58 4 58

crw us Rust

Fast, lightweight Firecrawl alternative in Rust. Web scraper, crawler & search API with MCP server for AI agents. Drop-in Firecrawl-compatible API (/v...

58 5 58

talospider howie6879 Python

talospider - A simple,lightweight scraping micro-framework

57 4 57

All-IT-eBooks-Spider Kulbear Python

[Updated] A simple python crawler for my tutorial blog at http://www.jianshu.com/p/8fb5bc33c78e

57 34 57

SearchEngineScrapy naqushab Python

Scrape data from Google.com, Bing.com, Baidu.com, Ask.com, Yahoo.com, Yandex.com

57 16 57

billboard-json KoreanThinker TypeScript

🎧 Get json type billboard hot 100 chart

57 8 57

python joaopauloaramuni HTML

Repo Python

57 2 57

facebook-page-info-scraper wael-sudo2 Python

Free Facebook pages MetaData Scraping Library - Unlimited Calls

57 9 57

Tapestry NatsuFox Python

Tapestry - 基于 Agent Skill Bundle 的轻量级书签知识库

57 6 57

Visual_MediaCrawler persist-1 Python

可视化爬虫（支持：哔哩哔哩 | 抖音 | 小红书 | 贴吧 | 微博 | 知乎 | 快手），异步、高效、直观地采集国内主流平台的媒体数据的前后端一体项目（Based on "Medi...

57 9 57

instagram-hashtag-crawler simonseo Python

Crawl Instagram hashtags

56 19 56

price-monitoring roccomuso JavaScript

Node.js price monitoring library, leveraging the power of x-ray and nightmare.

56 7 56

actor-facebook-scraper pocesar TypeScript

Scrape public Facebook pages, posts, reviews and comments

56 32 56

crawler a11ywatch Rust

gRPC web crawler turbo charged for performance

56 4 56

ClaudeChrome NatsuFox JavaScript

ClaudeChrome - Native browser context awareness for agents.

56 8 56

PicCrawler fengzhizi715 Java

使用RxJava2 和 Java 8的特性开发的图片爬虫

55 14 55

MyCrawler netcan Python

我的爬虫合集

55 3 55

devsearch nicholaskajoh Python

A web search engine built with Python which uses TF-IDF and PageRank to sort search results.

55 13 55

simple_bank_korea Beomi Python

simple crawler for Korean banks with Transactions

55 12 55

NewsCrap odaysec Python

NewsCrap adalah alat scraping berita Google berbasis Command Line Interface (CLI) yang dirancang untuk riset, investigasi, dan pengumpulan data OSINT....

55 13 55

kalel noobscode Python

Kal El Network Stress Test and Penetration Testing Toolkit

54 16 54

crawler_shopee charlie0227 Python

Shopee coin getter is a script to collect daily shopee coins.

54 13 54

chan-downloader mariot Rust

CLI to download all images/webms in a 4chan thread

54 7 54

rarbgcli FarisHijazi Python

RARBG command line interface for scraping the rarbg.to torrent search engine

54 8 54

kabegame kabegame Rust

Kabegame — An anime image crawler client with pluggable crawlers (from a GitHub plugin repo), wallpaper rotation by custom rules, and Wallpaper Engine...

54 1 54

rag-crawler sigoden TypeScript

Crawl a website to generate knowledge file for RAG

54 11 54

crawler

Repositories (1431)