Topic

crawler

Repositories (1232)

AmazonRobot
AmazonRobot WuLC Python

Amazon商品引流的 python 爬虫

116
bots-zoo
bots-zoo antoinevastel JavaScript
115
Lcrawl
Lcrawl lndj PHP

一只优雅的正方教务系统爬虫。

114
ungoliant
ungoliant oscar-project Rust

:spider: The pipeline for the OSCAR corpus

113
linkcrawler
linkcrawler schollz Go

Cross-platform persistent and distributed web crawler :link:

112
ThesaurusSpider
ThesaurusSpider WuLC Python

下载搜狗、百度、QQ输入法的词库文件的 python 爬虫,可用于构建不同行业的词汇库

112
APSoft-Web-Scanner-v2
APSoft-Web-Scanner-v2 ph09nix C#

Powerful dork searcher and vulnerability scanner for windows platform

112
bee-university
bee-university beecost Python

Project thu thập điểm chuẩn đại học 2014 - 2018 và phân tích dữ liệu

111
scrapy-puppeteer
scrapy-puppeteer clemfromspace Python

Scrapy + Puppeteer

111
pagser
pagser foolin Go

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

111
WeiboCrawler
WeiboCrawler XWang20 Python

无cookie版微博爬虫,可以连续爬取一个或多个新浪微博用户信息、用户微博及其微博评论转发。

110
proxy-pool
proxy-pool denghuichao Java

爬虫代理IP池服务,可供其他爬虫程序通过restapi获取

110
gflare-tk
gflare-tk beb7 Python

Open-Source Python Based SEO Web Crawler

110
starfish-ql
starfish-ql SeaQL Rust

✴️ An experimental graph database

110
goscraper
goscraper badoux Go

Golang pkg to quickly return a preview of a webpage (title/description/images)

109
crawler
crawler brantou Python

爬虫, http代理, 模拟登陆!

108
tracker-radar-collector
tracker-radar-collector duckduckgo JavaScript

🕸 Modular, multithreaded, puppeteer-based crawler

108
zyte-smartproxy-headless-proxy
zyte-smartproxy-headless-proxy zytedata Go

A complimentary proxy to help to use SPM with headless browsers

108
bose
bose omkarcloud Python

✨ BOSE IS SWISS ARMY KNIFE 🔪 FOR BOT DEVELOPMENT. THE ULTIMATE BOT DEVELOPMENT FRAMEWORK. 🤖

107
collector
collector thenurhabib Python

Collect XSS vulnerable parameters from entire domain.

106
images-web-crawler
images-web-crawler amineHorseman Python

This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can cr...

105
CrawlerPack
CrawlerPack abola Java

Java 網路資料爬蟲包

104
antispider
antispider dytttf JavaScript
104
WeiboSpider
WeiboSpider CharesFang Python

微博爬虫,一个基于Scrapy框架的轻量微博爬虫,Sina Weibo Spider

104
crawler_detect
crawler_detect loadkpi Ruby

Ruby gem to detect bots and crawlers via the user agent

104
4scanner
4scanner pboardman Python

Continuously search imageboards threads for images/webms and download them

103
webb
webb hardikvasa Python

Python: An all-in-one Web Crawler, Web Parser and Web Scrapping library!

102
PHPCreeper
PHPCreeper blogdaren PHP

A new generation of multi-process asynchronous event-driven spider engine based on Workerman. http://www.phpcreeper.com

102
google-arts-crawler
google-arts-crawler piotrantosz Python

Google Arts & Culture high quality image downloader

102
Scrapy_IPProxyPool
Scrapy_IPProxyPool monkey-soft Python

免费 IP 代理池。Scrapy 爬虫框架插件

101
Weibo-Album-Crawler
Weibo-Album-Crawler Lodour Python

A multiprocessing crawler for weibo albums.

99
LinkedIn-Scraper
LinkedIn-Scraper TufayelLUS Python

A LinkedIn Scraper to scrape up to 10k LinkedIn profiles from company profile links and save their e-mail addresses if available!

98
AyugeSpiderTools
AyugeSpiderTools shengchenyang Python

使 scrapy 开发不用在意 item,pipeline,middleware 等通用场景下模块的编写,解放开发者的双手。

98
google-maps-scraper
google-maps-scraper omkarcloud Python

👋 HOLA! ENJOY OUR GOOGLE MAPS SCRAPER 🚀 TO EFFORTLESSLY EXTRACT DATA SUCH AS NAMES, ADDRESSES, PHONE NUMBERS, WEBSITES, AND RATINGS FROM GOOGLE MAPS...

98
pappet
pappet patrickschur JavaScript

A command-line tool to crawl websites using puppeteer.

98
scaleable-crawler-with-docker-cluster
scaleable-crawler-with-docker-cluster tonywangcn Python

a scaleable and efficient crawelr with docker cluster , crawl million pages in 2 hours with a single machine

97
asyncpy
asyncpy lixi5338619 Python

使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架

97
copymanga-downloader
copymanga-downloader misaka10843 Python

使用python编译exe/bash/命令行参数来下载copymanga(拷贝漫画)中的漫画,支持批量+选话下载和获取您收藏的漫画并下载!(windows&linux支持,MacOS代码支持)

96
Taiwan-news-crawlers
Taiwan-news-crawlers TaiwanStat Python

Scrapy-based Crawlers for news of Taiwan

95
MetaFinder
MetaFinder Josue87 Python

Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata

94
gopa-abandoned
gopa-abandoned medcl Go

GOPA, a spider written in Go.(NOTE: this project moved to https://github.com/infinitbyte/gopa )

94
bathyscaphe
bathyscaphe creekorful Go

Fast, highly configurable, cloud native dark web crawler.

94
aliexscrape
aliexscrape ducdev JavaScript

Get Aliexpress product details in JSON

93
dcard-spider
dcard-spider leVirve Python

A spider on Dcard. Strong and speedy.

93
python-tools
python-tools lucasayres Python

A collection of Python tools, scripts and utilities to make your life easier.

91
slrp
slrp nfx Go

rotating open proxy multiplexer

91
Novel-crawler
Novel-crawler ling7334 Python

这是一个用Python写的小说爬虫软件

91
SpotifyScraper
SpotifyScraper AliAkhtari78 Python

Spotify Scraper to extract all the information from spotify, download mp3 with cover of the song

91
Price-monitor
Price-monitor qqxx6661 Python

某东商品价格监控:自定义商品价格,降价邮件/微信提醒。技术:Python爬虫/IP代理池/JS接口爬取/Selenium页面爬取

89
news-crawler
news-crawler LuChang-CS Python

A news crawler for BBC News, Reuters and New York Times.

89