Most popular crawler repositories and open source projects

scrapedin linkedtales JavaScript

LinkedIn Scraper (currently working 2020)

611 171 611

fredy orangecoding HTML

❤️ Fredy - [F]ind [R]eal [E]state [D]amn Eas[y] - Fredy keeps searching for new apartments, houses, and flats in Germany on platforms like ImmoScout24...

610 148 610

Moodle-DL C0D3D3V Python

Moodle-DL downloads course content fast from Moodle (eg. lecture pdfs)

601 75 601

newcrawler speed JavaScript

Free Web Scraping Tool with Java

587 112 587

pryingdeep iudicium Go

Prying Deep - An OSINT tool to collect intelligence on the dark web.

582 50 582

python-automation-scripts avidLearnerInProgress Python

Simple yet powerful automation stuffs.

564 163 564

webster zhuyingda JavaScript

a reliable high-level web crawling & scraping framework for Node.js.

561 52 561

mmjpg chenjiandongx Python

👩 美女写真套图爬虫（一）

556 251 556

webclaw 0xMassi Rust

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

551 67 551

vault abhisharma404 Python

swiss army knife for hackers

549 98 549

crawljax crawljax Java

Crawljax

540 225 540

freshonions-torscraper dirtyfilthy Python

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

535 145 535

Python3Webcrawler mochazi Python

🌈Python3网络爬虫实战：QQ音乐歌曲、京东商品信息、房天下、破解有道翻译、构建代理池、豆瓣读书、百度图片、破解网易登录、B站模拟扫码登录、小鹅通、荔枝微课

531 122 531

nintendo-switch-eshop lmmfranco TypeScript

Crawler for Nintendo Switch eShop

523 85 523

Krawl BlessedRebuS Python

Krawl is a customizable, lightweight, cloud-native web deception server and anti-crawler that creates fake web applications with low-hanging vulnerabi...

521 38 521

opensearchserver jaeksoft Java

Open-source Enterprise Grade Search Engine Software

515 192 515

Scan-T nanshihui C

a new crawler based on python with more function including Network fingerprint search

504 213 504

TorCrawl.py MikeMeliz Python

Crawl and extract (regular or onion) webpages through TOR network

504 89 504

scrapple AlexMathew Python

A framework for creating semi-automatic web content extractors

502 41 502

reader vakra-dev TypeScript

Open-source, production-grade web scraping engine built for LLMs. Scrape and crawl the entire web, clean markdown, ready for your agents.

499 33 499

Html2Article stanzhai C#

Html网页正文提取

496 170 496

Fast-Powerful-Whisper-AI-Services-API Evil0ctal Python

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API，使用本地运行的Whisper模型进行推理，并支持多GPU并发，针对分布式部署进行设计...

464 58 464

ICLR2020-OpenReviewData shaohua0116 Jupyter Notebook

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

460 41 460

tsrtc Asoul JavaScript

台灣股票即時爬蟲。Taiwan Stock Exchange Real Time Crawler

458 148 458

sitemap-generator lgraubner JavaScript

Easily create XML sitemaps for your website.

452 136 452

fundus flairNLP Python

A very simple news crawler with a funny name

450 108 450

AllNewsSpider Python3Spiders Python

澎湃新闻，新浪新闻，腾讯新闻，搜狐新闻，新闻联播，泰晤士报，纽约时报，BBCNews，旨在爬取所有新闻门户网站的新闻，禁止将所得数据商用！

449 75 449

Youtube-Projects ayushi7rawat Python

This repository contains all the code I use in my YouTube tutorials.

432 218 432

Pinkerton 0xdsm Python

🕵️ Python project to crawl for JavaScript files and search for secrets like API keys, authorization tokens, hardcoded credentials, etc.

431 56 431

media-scraper elvisyjlin Python

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok

428 52 428

signature_algorithm gadfly0x Python

各种App、小程序、网站的请求签名或加密算法。现已有：自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)

427 68 427

coom-dl notFaad Dart

Coomer| kemono .party or su downloader

427 40 427

dude roniemartinez Python

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

425 19 425

EmailFinder Josue87 Python

Search emails from a domain through search engines

418 92 418

gospider nange Go

golang实现的爬虫框架，使用者只需关心页面规则，提供web管理界面。基于colly开发。

417 109 417

music-recover heqin-zhu Python

:musical_note: 缓存文件转换为 MP3 文件

409 119 409

sosse biolds Python

Selenium Open Source Search Engine & crawler

409 24 409

CrawlerForReader smuyyh Java

Android 本地网络小说爬虫，基于jsoup及xpath

406 137 406

magic_google howie6879 Python

Google search results crawler, get google search results that you need

404 110 404

jivesearch jivesearch JavaScript

A search engine that doesn't track you.

402 53 402

second-order mhmdiaa Go

Second-order subdomain takeover scanner

402 67 402

ghcrawler microsoft JavaScript

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...

390 93 390

scraperai scraperai HTML

ScraperAI is an open-source, AI-powered tool designed to simplify web scraping for users of all skill levels.

390 53 390

ICLR2019-OpenReviewData shaohua0116 Jupyter Notebook

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

387 30 387

InstagramCrawler tzuhsial Python

A non API python program to crawl public photos, posts or followers

384 103 384

supercrawler brendonboshell JavaScript

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limi...

382 63 382