Most popular crawler repositories and open source projects

ruia howie6879 Python

Async Python 3.6+ web scraping micro-framework based on asyncio

1.7k 186 39

AutoCrawler YoongiKim Python

Google, Naver multiprocess image web crawler (Selenium)

1.7k 423 44

selectolax rushter Cython

Python binding to Modest and Lexbor engines. Fast HTML5 parser with CSS selectors for Python.

1.7k 91 14

spider_collection srx-2000 Python

python爬虫，目前库存：网易云音乐歌曲爬取，B站视频爬取，知乎问答爬取，壁纸爬取，xvideos视频爬取，有声书爬取，微博爬虫，安居客信息爬取+数据可视化，哔哩...

1.6k 241 19

grab-site ArchiveTeam Python

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

1.6k 157 39

douyin erma0

抖音爬虫——采集账号主页、喜欢、收藏、音乐原声、话题、搜索、合集、作品、关注、粉丝等公开数据。

1.6k 297 13

CatVodTVSpider CatVodTVOfficial Java

1.6k 930 1.6k

scrapoxy fabienvauchelles JavaScript

Scrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!

1.6k 222 1.6k

lightcrawler github JavaScript

Crawl a website and run it through Google lighthouse

1.6k 143 0

ScopeSentry Autumn-27 Go

ScopeSentry-Cyberspace mapping, subdomain enumeration, port scanning, sensitive information discovery, vulnerability scanning, distributed nodes

1.5k 214 17

single-file-cli gildas-lormeau JavaScript

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

1.5k 134 12

fscrawler dadoonet Java

Elasticsearch File System Crawler (FS Crawler)

1.4k 306 68

XHS-Spider xisuo67

小红书数据采集、网站图片、视频资源批量下载工具，颜值超高的数据采集工具（批量下载，视频提取，图片）Telegram:https://t.me/+ZtLSwuIKTo44MDY1

1.4k 84 10

OpenWPM openwpm Python

A web privacy measurement framework

1.4k 334 65

wreq-python 0x676e67 Rust

An ergonomic, privacy-aware Python HTTP Client

1.4k 113 23

sperm darbra

浏览过的精彩逆向文章汇总，值得一看

1.4k 392 34

rebrowser-patches rebrowser JavaScript

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy...

1.4k 78 28

SwiftLinkPreview leonardocardoso Swift

It makes a preview from an URL, grabbing all the information such as title, relevant texts and images.

1.4k 196 1

mlscraper lorey Python

🤖 Scrape data from HTML websites automatically by just providing examples

1.4k 93 15

crawler kgspider JavaScript

K 哥爬虫代码分享，JS 逆向，爬虫进阶。关注公众号：K哥爬虫

1.4k 342 15

wombat felipecsl Ruby

Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

1.4k 128 48

jd-autobuy adyzng Python

Python爬虫，京东自动登录，在线抢购商品

1.3k 569 103

XSRFProbe 0xInfection Python

The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.

1.3k 219 36

go-dork dwisiswant0 Go

The fastest dork scanner written in Go.

1.3k 139 19

instagram-profilecrawl InstaPy Python

📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.

1.3k 265 57

fredy orangecoding JavaScript

❤️ Fredy - [F]ind [R]eal [E]state [D]amn Eas[y] - Fredy keeps searching for new apartments, houses, and flats in Germany on platforms like ImmoScout24...

1.3k 193 8

python-fxxk-spider ityard

收集各种免费的 Python 爬虫项目

1.3k 201 17

Beanbun kiddyuchina PHP

Beanbun 是用 PHP 编写的多进程网络爬虫框架，具有良好的开放性、高可扩展性，基于 Workerman。

1.3k 249 76

BiliBili-Manga-Downloader Zeal-L Python

一个好用的哔哩哔哩漫画下载器，拥有图形界面，支持关键词搜索漫画和二维码登入，黑科技下载未解锁章节，多线程下载，多种保存格式，本地漫画管理，一键检查更新...

1.2k 56 7

AppCrawler seveniruby Scala

基于appium的app自动遍历工具

1.2k 473 80

sqliv the-robot Python

massive SQL injection vulnerability scanner

1.2k 378 1

fakebrowser kkoooqq JavaScript

🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.

1.2k 213 1.2k

mzitu chenjiandongx Python

👧 美女写真套图爬虫（二）

1.2k 339 45

bilili yutto-dev Python

:beers: bilibili video (including bangumi) and danmaku downloader | B站视频（含番剧）、弹幕下载器

1.2k 89 2

Pandora-Box snakem982 Vue

A Simple Mihomo GUI. 一个简易的 Mihomo 桌面客户端

1.2k 125 12

tumblr-crawler dixudx Python

Easily download all the photos/videos from tumblr blogs. 下载指定的 Tumblr 博客中的图片，视频

1.2k 336 78

newspaper4k AndyTheFactory Python

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

1.1k 113 11

fess codelibs Java

Open-source, self-hosted enterprise & site search server built on OpenSearch. Crawls web / file / DB / cloud sources, 20+ languages, REST API, and AI/...

1.1k 173 61

crawly elixir-crawly Elixir

Crawly, a high-level web crawling & scraping framework for Elixir.

1.1k 122 17

kimuraframework vifreefly Ruby

Write web scrapers in Ruby using a clean, AI-assisted DSL. Kimurai uses AI to figure out where the data lives, then caches the selectors and scrapes w...

1.1k 162 27