Topic

crawler

Repositories (1431)

Instagram-Bot
Instagram-Bot mustafadalga Python

An Instagram bot developed using the Selenium Framework

284
PixivCrawler
PixivCrawler cwher Python

Pixiv Utils implemented in Python, including Pixiv Crawler and Mosaic Puzzles, support for rankings, personal bookmarks, artist works and keyword sear...

284
aliexpress-product-scraper
aliexpress-product-scraper sudheer-ranga JavaScript

Get Aliexpress product details as a json response including feedbacks, variants, shipping info, description, images, etc.,

281
th-music-video-generator
th-music-video-generator Jasonnor JavaScript

Touhou Project random music video generator/player, crawling image and video from websites to generate MV.

280
gplay-scraper
gplay-scraper Mohammedcha Python

GPlay Scraper is a powerful Python Google Play scraper library for extracting comprehensive app data from the Google Play Store. Scrape Google Play St...

279
Antibot-Detector
Antibot-Detector scrapfly JavaScript

Real-time detection of anti-bot systems, CAPTCHAs & fingerprinting techniques. Identifies Cloudflare, Akamai, DataDome, reCAPTCHA, hCaptcha, Shape Se...

277
Strong-Web-Crawler
Strong-Web-Crawler microfisher C#

基于C#.NET+PhantomJS+Sellenium的高级网络爬虫程序。可执行Javascript代码、触发各类事件、操纵页面Dom结构。

276
go-movies
go-movies hezhizheng Go

golang spider Crawler 爬虫 电影

273
weiboPicDownloader
weiboPicDownloader yAnXImIN Java

免登录下载微博图片 爬虫 Download Weibo Images without Logging-in

271
LinkedIn-Scraper
LinkedIn-Scraper TufayelLUS Python

A LinkedIn Scraper to scrape up to 1k LinkedIn profiles(due to LinkedIn limit) from company profile links and save their e-mail addresses if available...

270
onecomic
onecomic hardwarecode Python

一本漫画

270
D4N155
D4N155 OWASP Shell

OWASP D4N155 - Intelligent and dynamic wordlist using OSINT

269
RuiJi.Net
RuiJi.Net zhupingqi C#

crawler framework, distributed crawler extractor

267
antch
antch antchfx Go

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

265
algoliasearch-netlify
algoliasearch-netlify algolia TypeScript

Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler

264
Github-spider
Github-spider chenjiandongx Python

Github 仓库及用户分析爬虫

264
laravel-seo-scanner
laravel-seo-scanner backstagephp PHP

Scan your Laravel application routes for SEO improvements suggestions.

264
ptt-alertor
ptt-alertor Ptt-Alertor Go

:loudspeaker: Ptt 文章通知機器人!Notify Ptt Article in Realtime

262
crypto-crawler-rs
crypto-crawler-rs crypto-crawler Rust

A rock-solid cryptocurrency crawler library.

261
galer
galer dwisiswant0 Go

A fast tool to fetch URLs from HTML attributes by crawl-in.

260
weibo_terminator_workflow
weibo_terminator_workflow lucasjinreal Python

Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!

259
Selenops
Selenops zntfdr Swift

A Swift Web Crawler 🕷

259
rotating-tor-http-proxy
rotating-tor-http-proxy zhaow-de Shell

A multi-arch image provides one HTTP proxy endpoint with many concurrent tunnels to the Tor network.

259
ok_ip_proxy_pool
ok_ip_proxy_pool cwjokaka Python

🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池

255
robots-txt
robots-txt spatie PHP

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers

255
arachnid
arachnid zrashwani PHP

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites

254
FileSensor
FileSensor Xyntax Python

Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具

253
xvideos
xvideos rodrigogs JavaScript

xvideos API library

253
InfinityCrawler
InfinityCrawler TurnerSoftware C#

A simple but powerful web crawler library for .NET

252
CSharpCrawler
CSharpCrawler zhaotianff C#

C#爬虫示例程序,想学习爬虫入门知识的可以看过来。后续会慢慢加入更多爬虫相关的知识。

252
chromium_for_spider
chromium_for_spider myvyang HTML

dynamic crawler for web vulnerability scanner

251
Sub
Sub Leon406 Kotlin

节点爬取,筛选, 支持Clash,base64订阅解析,自动生成可用的ss, ssr, v2ray, trojan节点. 已集成Github Action,每天8-24,定时更新.

249
ZhihuSpider
ZhihuSpider kong36088 Python

多线程知乎用户爬虫,基于python3

249
SpotifyScraper
SpotifyScraper AliAkhtari78 Makefile

Spotify Scraper to extract all the information from spotify, download mp3 with cover of the song

249
Tumblr_Crawler
Tumblr_Crawler sparrow629 Python

This is a Multi-thread crawler for Tumblr.

248
woid
woid vitorfs Python

Simple news aggregator displaying top stories in real time

245
Sitemap-Generator-Crawler
Sitemap-Generator-Crawler vezaynk PHP

PHP script to recursively crawl websites and generate a sitemap. Zero dependencies.

245
gf-secrets
gf-secrets dwisiswant0 Shell

Secret and/or credential patterns used for gf.

243
dorkscout
dorkscout R4yGM Go

DorkScout - Golang tool to automate google dork scan against the entiere internet or specific targets

242
web-page-monitor
web-page-monitor lgh06 JavaScript

Web Site Page Changes Monitor. 网站网页页面更新变更监控提醒。

241
siteone-crawler-gui
siteone-crawler-gui janreges Svelte

SiteOne Crawler GUI is a cross-platform website crawler and analyzer for SEO, security, accessibility, and performance optimization—ideal for develope...

237
indonesian-NLP-resources
indonesian-NLP-resources kirralabs

data resource untuk NLP bahasa indonesia

230
crawlab-lite
crawlab-lite crawlab-team Vue

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

230
goose-parser
goose-parser redco JavaScript

Universal scraping tool, which allows you to extract data using multiple environments

229
facebook-data-extraction
facebook-data-extraction 18520339 Python

Experience for effectively fetching Facebook data by Querying Graph API with Account-based Token and Operating undetectable scraping Bots to extract C...

226
KoreaNewsCrawler
KoreaNewsCrawler lumyjuwon Python

A korean news crawler built to ingest large amounts of news data.

225
WebVideoBot
WebVideoBot tim232385 Java

Web crawler.

224
black-widow
black-widow offensive-hub Python

GUI based offensive penetration testing tool (Open Source)

223
FooProxy
FooProxy 01ly Python

稳健高效的评分制-针对性- IP代理池 + API服务,可以自己插入采集器进行代理IP的爬取,针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库,支持Mongo...

222
weibo_wordcloud
weibo_wordcloud gaussic Python

根据关键词抓取微博数据,再生成词云

221