Topic

crawler

Repositories (1232)

taki
taki egoist TypeScript

Take a snapshot of any website.

140
HotNewsAnalysis
HotNewsAnalysis Jacen789 Python

利用文本挖掘技术进行新闻热点关注问题分析

139
nebula
nebula dennis-tra Jupyter Notebook

🌌 A libp2p DHT crawler, monitor, and measurement tool that exposes timely information about DHT networks.

137
spider
spider Winniekun Python

:star2::octocat: powered by python3( simple learning of spider) 百度文库;网易云歌曲; 豆瓣电影; GitHub; 京东; QQ空间; 天气; vip解析助手; TED文...

136
not-your-average-web-crawler
not-your-average-web-crawler tijme Python

A web crawler (for bug hunting) that gathers more than you can imagine.

136
CrawlBox
CrawlBox abaykan Python

Easy way to brute-force web directory.

136
Zhihu-Spider
Zhihu-Spider moranzcw Python

一个获取知乎用户主页信息的多线程Python爬虫程序。

135
site-audit-seo
site-audit-seo viasite JavaScript

Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv, xlsx...

134
Skill-Share-Crawler---DL
Skill-Share-Crawler---DL tharyckgusmao JavaScript

Download Videos Skill Share per ID or per Class

133
blinkist-m4a-downloader
blinkist-m4a-downloader luckylittle Go

Grabs all of the audio files from all of the Blinkist books

133
leetcode-ranking-search
leetcode-ranking-search chiehmin Vue

Leetcode Contest Ranking Searcher

133
ComputerStudent
ComputerStudent sfvsfv HTML

计算机专业系统性学习资料(python,c,c++,计算机组成,计算机网络,编译原理,电路,谷歌插件,爬虫)

132
dyer
dyer hominee Rust

Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.

131
onegram
onegram pauloromeira Python

This repository is no longer maintained.

130
php-crawler
php-crawler hedii PHP

A php crawler that finds emails on the internets

130
pricetrack
pricetrack duyet JavaScript

Price tracker monitors of products and alerts you when prices drop. Supported tiki.vn, shopee, lotte.vn, ... Built with firebase https://pricetrack.we...

129
picacomic_downloader
picacomic_downloader muyoou Python

哔咔漫画收藏夹下载程序

129
scraply
scraply alash3al Go

Scraply a simple dom scraper to fetch information from any html based website

129
acm-statistics
acm-statistics Liu233w C#

An online tool (crawler) to analyze users performance in online judges (coding competition websites). Supported OJ: POJ, HDU, HYSBZ, CodeForces, UVA,...

128
proxifier
proxifier rookmoot Go

A fast, modern and intelligent proxy rotator perfect for crawling and scraping public data.

128
Sina-Weibo-Album-Downloader
Sina-Weibo-Album-Downloader lincanbin Python

Multithreading download all HD photos / pictures from someone's Sina Weibo album.

127
docs
docs zhangslob

《数据采集从入门到放弃》源码。内容简介:爬虫介绍、就业情况、爬虫工程师面试题 ;HTTP协议介绍; Requests使用 ;解析器Xpath介绍; MongoDB与MySQL; 多线程...

127
Ceiba-Downloader
Ceiba-Downloader jameshwc Python

This is a course-downloader to help NTU students download courses data from NTU Ceiba.

127
lumberjack
lumberjack JakePartusch JavaScript

An automated website accessibility scanner and cli

126
s3recon
s3recon clarketm Python

Amazon S3 bucket finder and crawler.

125
wget-lua
wget-lua ArchiveTeam C

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

125
sentinel-crawler
sentinel-crawler wx-chevalier JavaScript

Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with...

124
graphquery
graphquery storyicon Go

GraphQuery is a query language and execution engine tied to any backend service.

124
memex-explorer
memex-explorer nasa-jpl-memex Python

Viewers for statistics and dashboarding of Domain Search Engine data

124
pdf-crawler
pdf-crawler SimFin Python

SimFin's open source PDF crawler

124
PatentCrawler
PatentCrawler will4906 Python

scrapy专利爬虫(停止维护)

123
instagram-profilecrawl
instagram-profilecrawl nacimgoura JavaScript

:computer: Quickly crawl the information (e.g. followers, tags, etc...) of an instagram profile. No login required!

123
smarter-encryption
smarter-encryption duckduckgo Perl
123
Terpene-Profile-Parser-for-Cannabis-Strains
Terpene-Profile-Parser-for-Cannabis-Strains MaxValue Python

Parser and database to index the terpene profile of different strains of Cannabis from online databases

123
aiotieba
aiotieba Starry-OvO Python

百度贴吧吧务管理器✨删帖机✨使用aiohttp封装大量贴吧核心API

122
GoogleImagesDownloader
GoogleImagesDownloader WuLC Python

Enlarge training dataset by searching images with specified keywords in google and download the presented images

121
prerender-java
prerender-java greengerong Java

java framework for prerender

120
findpapers
findpapers jonatasgrosman Python

Findpapers: A tool for helping researchers who are looking for related works

120
google-news-scraper
google-news-scraper lewisdonovan JavaScript

Lightweight scraper for Google News

120
TiebaManager
TiebaManager xfgryujk C++

(已跑路)百度贴吧吧务管理工具,自动扫描帖子并处理违规帖

119
poopak
poopak teal33t Python

POOPAK - TOR Hidden Service Crawler

119
eyes
eyes r05323028 Python

Public Opinion Mining System of Taiwanese Forums

119
WebReaper
WebReaper pavlovtech C#

Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.

119
BaiduCrawler
BaiduCrawler mazzzystar Python

Sample of using proxies to crawl baidu search results.

118
auto-lighthouse
auto-lighthouse TGiles HTML

A utility package for automating lighthouse reporting

117
spidy
spidy twiny Go

Domain names collector - Crawl websites and collect domain names along with their availability status.

117
andvaranaut
andvaranaut glouw C

A dungeon crawler

116
AmazonRobot
AmazonRobot WuLC Python

Amazon商品引流的 python 爬虫

116
npm-search
npm-search algolia TypeScript

🗿 npm ↔️ Algolia replication tool :skier: :snail: :artificial_satellite:

116
bots-zoo
bots-zoo antoinevastel JavaScript
115