Topic

crawler

Repositories (1431)

x-kit
x-kit xiaoxiunique TypeScript

一个用于抓取和分析 X (Twitter) 用户数据和推文的工具。

914
BaiduImageSpider
BaiduImageSpider kong36088 Python

一个超级轻量的百度图片爬虫

913
TumblThree
TumblThree johanneszab C#

A Tumblr Blog Backup Application

911
chatWeb
chatWeb SkywalkerDarren Python

ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key...

911
parse-video
parse-video wujunwei928 Go

Golang短视频去水印:抖音,皮皮虾,火山,微视,最右,快手,全民小视频,皮皮搞笑,西瓜视频,虎牙,梨视频,acfun,好看视频...

904
scrapyrt
scrapyrt scrapinghub Python

HTTP API for Scrapy spiders

880
skrape.it
skrape.it skrapeit Kotlin

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places...

869
bookcorpus
bookcorpus soskek Python

Crawl BookCorpus

855
spider_reverse
spider_reverse 0xAllenChen Python

爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss...

849
ArrowDL
ArrowDL setvisible C++

ArrowDL (Arrow Downloader) is a download manager for Windows, MacOS and Linux

844
Weibo-Analyst
Weibo-Analyst KimMeen Python

Social media (Weibo) comments analyzing toolbox in Chinese 微博评论分析工具, 实现功能: 1.微博评论数据爬取; 2.分词与关键词提取; 3.词云与词频统计; 4.情...

840
spidr
spidr postmodern Ruby

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to...

834
easy-scraping-tutorial
easy-scraping-tutorial MorvanZhou Jupyter Notebook

Simple but useful Python web scraping tutorial code.

817
till
till DataHenHQ Go

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code...

815
course-crawler
course-crawler Foair Python

🎓 中国大学MOOC、学堂在线、网易云课堂、好大学在线、爱课程 MOOC 课程下载。

808
jvppeteer
jvppeteer fanyong920 Java

Java API For Chrome and Firefox

807
Lulu
Lulu iawia002 Python

[Unmaintained] A simple and clean video/music/image downloader 👾

806
pic-gather
pic-gather Licoy

🛑 image collector, which supports custom acquisition source configuration and is compatible with MacOS and Windows operating systems.

801
fetchbot
fetchbot PuerkitoBio Go

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

791
xeHentai
xeHentai fffonion Python

Doujinshi downloader 绅士漫画下载

788
seo-audits-toolkit
seo-audits-toolkit StanGirard Python

SEO & Security Audit for Websites. Lighthouse & Security Headers crawler, Sitemap/Keywords/Images Extractor, Summarizer, etc ...

786
creeper
creeper wspl Go

:paw_prints: Creeper - The Next Generation Crawler Framework (Go)

778
wreq
wreq 0x676e67 Rust

An ergonomic Rust HTTP Client with TLS fingerprint

777
Scavenger
Scavenger rndinfosecguy Python

Crawler (Bot) searching for credential leaks on paste sites.

774
js-cookie-monitor-debugger-hook
js-cookie-monitor-debugger-hook JSREI TypeScript

js cookie逆向利器:js cookie变动监控可视化工具 & js cookie hook打条件断点

772
BaiduSpider
BaiduSpider BaiduSpider Python

BaiduSpider,一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百...

765
lxBook
lxBook lixi5338619 JavaScript

《爬虫逆向进阶实战》书籍代码库

761
xxl-crawler
xxl-crawler xuxueli Java

A lightweight web crawler framework.(Java爬虫框架)

754
linkedin-profile-scraper-api
linkedin-profile-scraper-api josephlimtech TypeScript

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON.

753
hacker-news-digest
hacker-news-digest polyrabbit Python

:newspaper: Let ChatGPT Summarize Hacker News for You

752
siteone-crawler
siteone-crawler janreges Rust

SiteOne Crawler is a cross-platform website crawler and analyzer for SEO, security, accessibility, and performance optimization—ideal for developers,...

728
PyPtt
PyPtt PyPtt Python

The best PTT library

722
TumblThree
TumblThree TumblThreeApp C#

A Tumblr and Twitter Blog Backup Application

717
wscan
wscan chushuai Go

Wscan is a web security scanner that focuses on web security, dedicated to making web security accessible to everyone.

704
seonaut
seonaut StJudeWasHere Go

Open source SEO audit tool.

690
fbcrawl
fbcrawl rugantio Python

A Facebook crawler

685
FileMasta
FileMasta ohhsodead C#

A search application to explore, discover and share online files

669
Search-Engines-Scraper
Search-Engines-Scraper tasos-py Python

Search google, bing, yahoo, and other search engines with python

667
gOSINT
gOSINT Nhoya Go

OSINT Swiss Army Knife

666
Free_Proxy_Website
Free_Proxy_Website cyubuchen Python

获取免费socks/https/http代理的网站集合

664
Craw4LLM
Craw4LLM cxcscmu Python

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

654
DouYin
DouYin Python3WebSpider Python

API of DouYin for Humans used to Crawl Popular Videos and Musics

651
NetDiscovery
NetDiscovery fengzhizi715 Java

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

648
learnPython
learnPython rieuse Python

Python的基础练习代码与各种爬虫代码

644
TikHub-API-Python-SDK
TikHub-API-Python-SDK TikHub Python

High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验...

643
pywebcopy
pywebcopy rajatomar788 Python

Locally saves webpages to your hard disk with images, css, js & links as is.

640
runoob-PDF-
runoob-PDF- gagayuan Python

爬取菜鸟教程网站并转PDF__python_crawer_by_chrome

636
go_jobs
go_jobs go-crawler Go

带你了解一下Golang的市场行情

624
dotcommon
dotcommon Kharacternyk Python

What do people have in their dotfiles?

620
Jie
Jie yhy0 Go

Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features en...

615