Topic

crawler

Repositories (1232)

ComicSpider
ComicSpider QuantumLiu Python

动漫之家漫画站电脑版原图爬虫

68
Wedge
Wedge LZ0211 JavaScript

可配置的小说下载及电子书生成工具

67
hproxy
hproxy howie6879 Python

hproxy - Asynchronous IP proxy pool, aims to make getting proxy as convenient as possible.(异步爬虫代理池)

66
newspaperjs
newspaperjs flickz HTML

News extraction and scraping. Article Parsing

66
Pasta
Pasta Kr0ff Python

A PasteBin scrapper that doesnt rely on the PasteBin scrape API

66
JewelCrawler
JewelCrawler DMinerJackie Java

豆瓣电影爬虫——a crawler which is able to crawl movie detail and short comments, save them to database mysql, also include Sentiment analysis based on...

65
medium-crawler
medium-crawler NISH1001 Python

A crawler for scraping posts from medium.com

65
Google-Patents-Scraper
Google-Patents-Scraper wenyalintw Python

Automatically download all PDF files of searching results & their patent families found on Google Patents.

65
carbonbot
carbonbot crypto-crawler Rust

A command line tool based on the crypto-crawler library.

65
dht-crawler
dht-crawler hijkzzz Go

A DHT Crawler based on Goroutine

64
Tor_Spider
Tor_Spider absingh31 Python

Python project to crawl and scrap the lesser known deep web or one can say dark web. Just provide the onion link and get started.

64
Pinterest-infinite-crawler
Pinterest-infinite-crawler mirusu400 Python

An infinite Pinterest crawler/scraper. Crawl image with inifnite-scroll!

64
GMaps-Crawler
GMaps-Crawler guilatrova Python

Google Maps crawler using Selenium. All extracted data is forwarded to a SQS queue.

64
HydraRecon
HydraRecon aufzayed Python

All In One, Fast, Easy Recon Tool

63
Auto_Shadowsocks
Auto_Shadowsocks VonSdite Python

Shadowsocks. 科学上网, 仅供学习。是免费的服务器,可能存在科学上网不稳定。

63
qr-pirate
qr-pirate mzollin Python

crawl QR-codes from search engines and look for bitcoin private keys

63
social-scraper
social-scraper nguyenvanhieuvn Python

Vietnamese text data crawler scripts for various sites (including Youtube, Facebook, 4rum, news, ...)

63
eastmoney
eastmoney minicloudsky JavaScript

python requests + Django+ nodejs koa+ mysql to crawl eastmoney fund and stock data,for data analysis and visualiaztion .

63
slime
slime nekolr Java

🍰 A visual crawler management platform

62
ZhihuVAPI
ZhihuVAPI cheezone Python

优雅地玩知乎

62
koshort
koshort koshort Python

(deprecated) :cat: koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.

62
tieba-zhuaqu
tieba-zhuaqu ankanch Python

百度贴吧分布式爬虫,用于贴吧数据挖掘。从贴吧维度和用户维度进行数据分析

62
sciBASIC
sciBASIC xieguigang Visual Basic .NET

sciBASIC# is a kind of dialect language which is derive from the native VB.NET language, and written for the data scientist.

62
Java-Carwler-Technology
Java-Carwler-Technology soberqian Java

网络数据采集技术—Java网络爬虫 (书稿完整代码,涉及网络爬虫的各种技术和知识点)

62
js_block
js_block webcoding HTML

研究学习各种拦截:反爬虫、拦截ad、防广告注入、斗黄牛等

62
crawdad
crawdad schollz Go

Cross-platform persistent and distributed web crawler :crab:

61
feaplat
feaplat Boris-code

爬虫管理系统,支持集群,弹性伸缩。支持运行feapder、scrapy、selenium、playwright等各种框架及脚本

61
metacritic_api
metacritic_api melroy89 PHP

PHP Metacritic API - Mirror from my GitLab

60
Web-Iota
Web-Iota SatinWuker Python

Iota is a web scraper which can find all of the images and links/suburls on a webpage

60
WebCrawler
WebCrawler Misterhex C#

Just a simple web crawler which return crawled links as IObservable using reactive extension and async await.

60
zhihu-crawler
zhihu-crawler NightMarcher Python

徒手实现定时爬取知乎,从中发掘有价值的信息,并可视化爬取的数据作网页展示。

60
custom-crawler
custom-crawler rollrat C#

🌌 High productivity semi-automatic crawler generator 🛠️🧰

60
Chemrtron
Chemrtron cho45 JavaScript

A document viewer; fuzzy match incremental search.

60
damai-tickets
damai-tickets Jxpro Python

大麦网抢票脚本案例

59
webspot
webspot crawlab-team Python

An intelligent web service to automatically detect web content and extract information from it.

59
Daily-code
Daily-code rui7157 Python

日常代码爬虫、gui小工具等

59
WebSpider
WebSpider xdoer JavaScript

基于Nodejs,superagent,cheerio的在线web爬虫项目,支持生成API

59
crawler-project
crawler-project Albert-W Go

Google资深工程师深度讲解Go语言 爬虫项目。

59
proxycrawl-python
proxycrawl-python crawlbase Python

ProxyCrawl Python library for scraping and crawling

59
pomp
pomp estin Python

Screen scraping and web crawling framework

59
rewe-discounts
rewe-discounts foo-git Python

Grabs current REWE discounts and saves them in a markdown file || Holt sich aktuelle REWE-Angebote und exportiert sie in eine Markdown-Liste

59
scrapy-distributed
scrapy-distributed Insutanto Python

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy...

59
phpcrawl
phpcrawl mmerian PHP

Copy of http://phpcrawl.cuab.de/ for using with composer

58
local-api-examples
local-api-examples kameleo-io C#

Easy-to-follow examples in Python, Node.js, and C# for web automation & multi-accounting with Kameleo anti-detect browser.

58
lyrics-crawler
lyrics-crawler willamesoares Python

Get the lyrics for the song currently playing on Spotify

58
ipfs-crawler
ipfs-crawler trudi-group Go

A crawler for the IPFS network, code for our paper (https://arxiv.org/abs/2002.07747). Also holds scripts to evaluate the obtained data and make simil...

58
SoFIFA
SoFIFA DiogoDantas Jupyter Notebook

A SoFIFA webcrawler and Machine Learning prediction

57
TumblTwo
TumblTwo johanneszab C#

TumblTwo, an Improved Fork of TumblOne, a Tumblr Downloader.

57
slideshare-downloader
slideshare-downloader yodiaditya Python

Python script to download slideshare pdf. This script able to download slide and converted into pdf automatically.

57
actor-facebook-scraper
actor-facebook-scraper pocesar TypeScript

Scrape public Facebook pages, posts, reviews and comments

56