Most popular crawler repositories and open source projects

x-kit xiaoxiunique TypeScript

一个用于抓取和分析 X (Twitter) 用户数据和推文的工具。

914 141 914

BaiduImageSpider kong36088 Python

一个超级轻量的百度图片爬虫

913 391 913

TumblThree johanneszab C#

A Tumblr Blog Backup Application

911 126 911

chatWeb SkywalkerDarren Python

ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key...

911 136 911

parse-video wujunwei928 Go

Golang短视频去水印：抖音,皮皮虾,火山,微视,最右,快手,全民小视频,皮皮搞笑,西瓜视频,虎牙,梨视频,acfun,好看视频...

904 246 904

scrapyrt scrapinghub Python

HTTP API for Scrapy spiders

880 162 880

skrape.it skrapeit Kotlin

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places...

869 69 869

bookcorpus soskek Python

Crawl BookCorpus

855 113 855

spider_reverse 0xAllenChen Python

849 193 849

ArrowDL setvisible C++

ArrowDL (Arrow Downloader) is a download manager for Windows, MacOS and Linux

844 43 844

Weibo-Analyst KimMeen Python

Social media (Weibo) comments analyzing toolbox in Chinese 微博评论分析工具, 实现功能: 1.微博评论数据爬取; 2.分词与关键词提取; 3.词云与词频统计; 4.情...

840 185 840

spidr postmodern Ruby

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to...

834 107 834

easy-scraping-tutorial MorvanZhou Jupyter Notebook

Simple but useful Python web scraping tutorial code.

817 545 817

till DataHenHQ Go

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code...

815 23 815

course-crawler Foair Python

🎓 中国大学MOOC、学堂在线、网易云课堂、好大学在线、爱课程 MOOC 课程下载。

808 193 808

jvppeteer fanyong920 Java

Java API For Chrome and Firefox

807 170 807

Lulu iawia002 Python

[Unmaintained] A simple and clean video/music/image downloader 👾

806 140 806

pic-gather Licoy

🛑 image collector, which supports custom acquisition source configuration and is compatible with MacOS and Windows operating systems.

801 212 801

fetchbot PuerkitoBio Go

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

791 91 791

xeHentai fffonion Python

Doujinshi downloader 绅士漫画下载

788 90 788

seo-audits-toolkit StanGirard Python

SEO & Security Audit for Websites. Lighthouse & Security Headers crawler, Sitemap/Keywords/Images Extractor, Summarizer, etc ...

786 144 786

creeper wspl Go

:paw_prints: Creeper - The Next Generation Crawler Framework (Go)

778 59 778

wreq 0x676e67 Rust

An ergonomic Rust HTTP Client with TLS fingerprint

777 100 777

Scavenger rndinfosecguy Python

Crawler (Bot) searching for credential leaks on paste sites.

774 131 774

js-cookie-monitor-debugger-hook JSREI TypeScript

js cookie逆向利器：js cookie变动监控可视化工具 & js cookie hook打条件断点

772 119 772

BaiduSpider BaiduSpider Python

BaiduSpider，一个爬取百度搜索结果的爬虫，目前支持百度网页搜索，百度图片搜索，百度知道搜索，百度视频搜索，百度资讯搜索，百度文库搜索，百度经验搜索和百...

765 186 765

lxBook lixi5338619 JavaScript

《爬虫逆向进阶实战》书籍代码库

761 201 761

xxl-crawler xuxueli Java

A lightweight web crawler framework.（Java爬虫框架）

754 317 754

linkedin-profile-scraper-api josephlimtech TypeScript

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON.

753 177 753

hacker-news-digest polyrabbit Python

:newspaper: Let ChatGPT Summarize Hacker News for You

752 95 752

siteone-crawler janreges Rust

SiteOne Crawler is a cross-platform website crawler and analyzer for SEO, security, accessibility, and performance optimization—ideal for developers,...

728 62 728

PyPtt PyPtt Python

The best PTT library

722 104 722

TumblThree TumblThreeApp C#

A Tumblr and Twitter Blog Backup Application

717 87 717

wscan chushuai Go

Wscan is a web security scanner that focuses on web security, dedicated to making web security accessible to everyone.

704 81 704

seonaut StJudeWasHere Go

Open source SEO audit tool.

690 112 690

fbcrawl rugantio Python

A Facebook crawler

685 222 685

FileMasta ohhsodead C#

A search application to explore, discover and share online files

669 71 669

Search-Engines-Scraper tasos-py Python

Search google, bing, yahoo, and other search engines with python

667 170 667

gOSINT Nhoya Go

OSINT Swiss Army Knife

666 81 666

Free_Proxy_Website cyubuchen Python

获取免费socks/https/http代理的网站集合

664 130 664

Craw4LLM cxcscmu Python

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

654 60 654

DouYin Python3WebSpider Python

API of DouYin for Humans used to Crawl Popular Videos and Musics

651 258 651

NetDiscovery fengzhizi715 Java

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

648 151 648

learnPython rieuse Python

Python的基础练习代码与各种爬虫代码

644 297 644

TikHub-API-Python-SDK TikHub Python

High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验...

643 75 643

pywebcopy rajatomar788 Python

Locally saves webpages to your hard disk with images, css, js & links as is.

640 117 640

runoob-PDF- gagayuan Python

爬取菜鸟教程网站并转PDF__python_crawer_by_chrome

636 370 636

go_jobs go-crawler Go

带你了解一下Golang的市场行情

624 122 624

dotcommon Kharacternyk Python

What do people have in their dotfiles?

620 31 620

Jie yhy0 Go

Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features en...

615 122 615

crawler

Repositories (1431)