Most popular crawler repositories and open source projects

weibo-topic-spider czy1999 Python

微博超级话题爬虫，微博词频统计+情感分析+简单分类，新增肺炎超话爬取数据

300 63 6

bitextor bitextor Python

Bitextor generates translation memories from multilingual websites

299 41 28

xvideos rodrigogs TypeScript

xvideos API library

299 84 27

site-audit-seo viasite JavaScript

Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv, xlsx

299 46 9

line-bot-tutorial twtrubiks Python

line-bot-tutorial use python flask

297 154 16

Sasila da2vin Python

一个灵活、友好的爬虫框架

295 69 21

pychromeless jairovadillo Python

Python Lambda Chrome Automation (naming pending)

292 115 9

ComicCrawler eight04 Python

An image crawler written in Python.

291 53 17

PixivCrawler cwher Python

Pixiv Utils implemented in Python, including Pixiv Crawler and Mosaic Puzzles, support for rankings, personal bookmarks, artist works and keyword sear...

290 40 1

PulsarRPA platonai Kotlin

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.

287 59 287

Gorecon devanshbatham Go

Gorecon is a All in one Reconnaissance Tool , a.k.a swiss knife for Reconnaissance , A tool that every pentester/bughunter might wanna consider into...

286 45 0

Instagram-Bot mustafadalga Python

An Instagram bot developed using the Selenium Framework

285 85 1

th-music-video-generator Jasonnor JavaScript

Touhou Project random music video generator/player, crawling image and video from websites to generate MV.

281 45 8

SpotifyScraper AliAkhtari78 Python

Extract public Spotify data — tracks, albums, artists, playlists, podcasts & lyrics — without the official API. Sync + async, typed models, one depend...

278 32 11

LinkedIn-Scraper TufayelLUS Python

A LinkedIn Scraper to scrape up to 1k LinkedIn profiles(due to LinkedIn limit) from company profile links and save their e-mail addresses if available...

278 62 2

Strong-Web-Crawler microfisher C#

基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascript代码、触发各类事件、操纵页面Dom结构。

276 145 38

go-movies hezhizheng Go

golang spider Crawler 爬虫电影

274 83 9

onecomic hardwarecode Python

一本漫画

273 40 6

weiboPicDownloader yAnXImIN Java

免登录下载微博图片爬虫 Download Weibo Images without Logging-in

272 50 6

laravel-seo-scanner backstagephp PHP

Scan your Laravel application routes for SEO improvements suggestions.

271 29 4

D4N155 OWASP Shell

OWASP D4N155 - Intelligent and dynamic wordlist using OSINT

270 50 19

antch antchfx Go

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

266 40 14

crypto-crawler-rs crypto-crawler Rust

A rock-solid cryptocurrency crawler library.

266 82 10

RuiJi.Net zhupingqi C#

crawler framework, distributed crawler extractor

265 38 9

algoliasearch-netlify algolia TypeScript

Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler

265 10 48

rotating-tor-http-proxy zhaow-de Shell

A multi-arch image provides one HTTP proxy endpoint with many concurrent tunnels to the Tor network.

264 50 9

Github-spider chenjiandongx Python

Github 仓库及用户分析爬虫

263 89 14

ptt-alertor Ptt-Alertor Go

:loudspeaker: Ptt 文章通知機器人！Notify Ptt Article in Realtime

263 70 8

galer dwisiswant0 Go

A fast tool to fetch URLs from HTML attributes by crawl-in.

263 38 5

weibo_terminator_workflow lucasjinreal Python

Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!

258 71 26

Selenops zntfdr Swift

A Swift Web Crawler 🕷

258 17 4

robots-txt spatie PHP

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers

258 44 9

arachnid zrashwani PHP

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites

254 59 20

ok_ip_proxy_pool cwjokaka Python

🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池

254 65 5

InfinityCrawler TurnerSoftware C#

A simple but powerful web crawler library for .NET

254 37 9

FileSensor Xyntax Python

Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具

253 74 9

FindJobs-Agent he-yufeng Python

LLM-powered toolkit for skill analysis, AI interviews, resume scoring, and job structuring. Automates professional skill taxonomy and interview proces...

253 27 15

chromium_for_spider myvyang HTML

dynamic crawler for web vulnerability scanner

251 41 3

CSharpCrawler zhaotianff C#

C#爬虫示例程序，想学习爬虫入门知识的可以看过来。后续会慢慢加入更多爬虫相关的知识。

251 59 5

Sub Leon406 Kotlin

节点爬取,筛选, 支持Clash,base64订阅解析,自动生成可用的ss, ssr, v2ray, trojan节点. 已集成Github Action,每天8-24,定时更新.

249 99 249

ZhihuSpider kong36088 Python

多线程知乎用户爬虫，基于python3

249 82 13

Sitemap-Generator-Crawler vezaynk PHP

PHP script to recursively crawl websites and generate a sitemap. Zero dependencies.

247 94 21

Tumblr_Crawler sparrow629 Python

This is a Multi-thread crawler for Tumblr.

246 72 28

woid vitorfs Python

Simple news aggregator displaying top stories in real time

245 119 16

gf-secrets dwisiswant0 Shell

Secret and/or credential patterns used for gf.

245 53 4

dorkscout R4yGM Go

DorkScout - Golang tool to automate google dork scan against the entiere internet or specific targets

244 29 9

siteone-crawler-gui janreges Svelte

SiteOne Crawler GUI is a cross-platform website crawler and analyzer for SEO, security, accessibility, and performance optimization—ideal for develope...

242 22 4

creatorhub 3441293738 Python

多平台内容监控·采集·搬运 —— 纯 Python(FastAPI + Playwright),一个 Web 面板管起抖音 / 小红书 / 快手

242 55 0

SpideyX RevoltSecurities Python

SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.

240 39 3

web-page-monitor lgh06 JavaScript

Web Site Page Changes Monitor. 网站网页页面更新变更监控提醒。

240 38 2

crawler

Repositories (1456)