Topic

crawler

Repositories (1431)

selectolax
selectolax rushter Cython

Python binding to Modest and Lexbor engines. Fast HTML5 parser with CSS selectors for Python.

1.6k
spider_collection
spider_collection srx-2000 Python

python爬虫,目前库存:网易云音乐歌曲爬取,B站视频爬取,知乎问答爬取,壁纸爬取,xvideos视频爬取,有声书爬取,微博爬虫,安居客信息爬取+数据可视化,哔哩...

1.6k
CatVodTVSpider
CatVodTVSpider CatVodTVOfficial Java
1.6k
scrapoxy
scrapoxy fabienvauchelles JavaScript

Scrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!

1.6k
grab-site
grab-site ArchiveTeam Python

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

1.6k
lightcrawler
lightcrawler github JavaScript

Crawl a website and run it through Google lighthouse

1.6k
videodl
videodl CharlesPikachu Python

Videodl: A lightweight video downloader written in pure python. (轻量级视频下载器,优先高清无水印,支持抖音,快手,小红书,B站,TikTok,YouTube,FIFA+...

1.5k
ScopeSentry
ScopeSentry Autumn-27 Go

ScopeSentry-Cyberspace mapping, subdomain enumeration, port scanning, sensitive information discovery, vulnerability scanning, distributed nodes

1.5k
douyin
douyin erma0 TypeScript

抖音爬虫——采集账号主页、喜欢、收藏、音乐原声、话题、搜索、合集、作品、关注、粉丝等公开数据。

1.5k
fscrawler
fscrawler dadoonet Java

Elasticsearch File System Crawler (FS Crawler)

1.4k
OpenWPM
OpenWPM openwpm Python

A web privacy measurement framework

1.4k
sperm
sperm darbra

浏览过的精彩逆向文章汇总,值得一看

1.4k
XHS-Spider
XHS-Spider xisuo67

小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具(批量下载,视频提取,图片)Telegram:https://t.me/+ZtLSwuIKTo44MDY1

1.4k
mlscraper
mlscraper lorey Python

🤖 Scrape data from HTML websites automatically by just providing examples

1.4k
SwiftLinkPreview
SwiftLinkPreview leonardocardoso Swift

It makes a preview from an URL, grabbing all the information such as title, relevant texts and images.

1.4k
crawler
crawler kgspider JavaScript

K 哥爬虫代码分享,JS 逆向,爬虫进阶。关注公众号:K哥爬虫

1.4k
wombat
wombat felipecsl Ruby

Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

1.4k
single-file-cli
single-file-cli gildas-lormeau JavaScript

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

1.3k
rebrowser-patches
rebrowser-patches rebrowser JavaScript

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy...

1.3k
wreq-python
wreq-python 0x676e67 Rust

An ergonomic Python HTTP Client with TLS fingerprint

1.3k
jd-autobuy
jd-autobuy adyzng Python

Python爬虫,京东自动登录,在线抢购商品

1.3k
XSRFProbe
XSRFProbe 0xInfection Python

The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.

1.3k
go-dork
go-dork dwisiswant0 Go

The fastest dork scanner written in Go.

1.3k
Beanbun
Beanbun kiddyuchina PHP

Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展性,基于 Workerman。

1.3k
instagram-profilecrawl
instagram-profilecrawl InstaPy Python

📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.

1.3k
python-fxxk-spider
python-fxxk-spider ityard

收集各种免费的 Python 爬虫项目

1.3k
BiliBili-Manga-Downloader
BiliBili-Manga-Downloader Zeal-L Python

一个好用的哔哩哔哩漫画下载器,拥有图形界面,支持关键词搜索漫画和二维码登入,黑科技下载未解锁章节,多线程下载,多种保存格式,本地漫画管理,一键检查更新...

1.2k
AppCrawler
AppCrawler seveniruby Scala

基于appium的app自动遍历工具

1.2k
sqliv
sqliv the-robot Python

massive SQL injection vulnerability scanner

1.2k
fakebrowser
fakebrowser kkoooqq JavaScript

🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.

1.2k
mzitu
mzitu chenjiandongx Python

👧 美女写真套图爬虫(二)

1.2k
bilili
bilili yutto-dev Python

:beers: bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器

1.2k
tumblr-crawler
tumblr-crawler dixudx Python

Easily download all the photos/videos from tumblr blogs. 下载指定的 Tumblr 博客中的图片,视频

1.2k
Pandora-Box
Pandora-Box snakem982 Vue

A Simple Mihomo GUI. 一个简易的 Mihomo 桌面客户端

1.1k
fess
fess codelibs Java

Fess is very powerful and easily deployable Enterprise Search Server.

1.1k
kimuraframework
kimuraframework vifreefly Ruby

Write web scrapers in Ruby using a clean, AI-assisted DSL. Kimurai uses AI to figure out where the data lives, then caches the selectors and scrapes w...

1.1k
newspaper4k
newspaper4k AndyTheFactory Python

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

1.1k
crawly
crawly elixir-crawly Elixir

Crawly, a high-level web crawling & scraping framework for Elixir.

1.1k
browsertrix-crawler
browsertrix-crawler webrecorder TypeScript

Run a high-fidelity browser-based web archiving crawler in a single Docker container

1k
Pxer
Pxer pea3nut JavaScript

A tool for pixiv.net. 人人可用的P站爬虫

1k
stormcrawler
stormcrawler apache Java

A scalable, mature and versatile web crawler based on Apache Storm

976
magnet-dht
magnet-dht chenjiandongx Python

✌️ Python3 BitTorrent DHT crawler

974
google-play-scraper
google-play-scraper JoMingyu Python

Google play scraper for Python inspired by <facundoolano/google-play-scraper>

963
crawler
crawler fredwu Elixir

A high performance web crawler / scraper in Elixir.

958
SpiderSuite
SpiderSuite spidersuite

SpiderSuite releases, wiki and roadmap

951
scrapfly-scrapers
scrapfly-scrapers scrapfly Python

Scalable Python web scraping scripts for +40 popular domains

947
FictionDown
FictionDown ma6254 Go

小说下载|小说爬取|起点|笔趣阁|导出Markdown|导出txt|转换epub|广告过滤|自动校对

939
SecCrawler
SecCrawler Le0nsec Go

一个方便安全研究人员获取每日安全日报的爬虫和推送程序,目前爬取范围包括先知社区、安全客、Seebug Paper、跳跳糖、奇安信攻防社区、棱角社区以及绿盟、腾讯玄...

932
zhihu-crawler
zhihu-crawler wycm Java

zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目

920
icrawler
icrawler hellock Python

A multi-thread crawler framework with many builtin image crawlers provided.

920