Most popular crawler repositories and open source projects

newspaperjs

News extraction and scraping. Article Parsing

19   66   66  

Pasta

A PasteBin scrapper that doesnt rely on the PasteBin scrape API

6   66   66  

carbonbot

A command line tool based on the crypto-crawler library.

8   65   65  

JewelCrawler

豆瓣电影爬虫——a crawler which is able to crawl movie detail and short...

57   65   65  

medium-crawler

A crawler for scraping posts from medium.com

15   65   65  

Google-Patents-Scraper

Automatically download all PDF files of searching results & their pate...

22   65   65  

dht-crawler

A DHT Crawler based on Goroutine

4   64   64  

Tor_Spider

Python project to crawl and scrap the lesser known deep web or one can...

16   64   64  

Pinterest-infinite-crawler

An infinite Pinterest crawler/scraper. Crawl image with inifnite-scrol...

11   64   64  

GMaps-Crawler

Google Maps crawler using Selenium. All extracted data is forwarded to...

17   64   64  

Auto_Shadowsocks

Shadowsocks. 科学上网, 仅供学习。是免费的服务器,可能存在科学上网不稳...

17   63   63  

social-scraper

Vietnamese text data crawler scripts for various sites (including Yout...

35   63   63  

eastmoney

python requests + Django+ nodejs koa+ mysql to crawl eastmoney fund an...

23   63   63  

qr-pirate

crawl QR-codes from search engines and look for bitcoin private keys

29   63   63  

HydraRecon

All In One, Fast, Easy Recon Tool

11   63   63  

ZhihuVAPI

优雅地玩知乎

14   62   62  

koshort

(deprecated) :cat: koshort is a Python package for Korean internet spo...

10   62   62  

sciBASIC

sciBASIC# is a kind of dialect language which is derive from the nativ...

29   62   62  

js_block

研究学习各种拦截:反爬虫、拦截ad、防广告注入、斗黄牛等

16   62   62  

tieba-zhuaqu

百度贴吧分布式爬虫,用于贴吧数据挖掘。从贴吧维度和用户维度进行数据分析

26   62   62  

Java-Carwler-Technology

网络数据采集技术—Java网络爬虫 (书稿完整代码,涉及网络爬虫的各种技术和...

20   62   62  

slime

🍰 A visual crawler management platform

28   62   62  

feaplat

爬虫管理系统,支持集群,弹性伸缩。支持运行feapder、scrapy、selenium、p...

13   61   61  

crawdad

Cross-platform persistent and distributed web crawler :crab:

9   61   61  

Chemrtron

A document viewer; fuzzy match incremental search.

14   60   60  

WebCrawler

Just a simple web crawler which return crawled links as IObservable us...

33   60   60  

zhihu-crawler

徒手实现定时爬取知乎,从中发掘有价值的信息,并可视化爬取的数据作网页展...

9   60   60  

custom-crawler

🌌 High productivity semi-automatic crawler generator 🛠️🧰

4   60   60  

Web-Iota

Iota is a web scraper which can find all of the images and links/subur...

5   60   60  

metacritic_api

PHP Metacritic API - Mirror from my GitLab

13   60   60  

rewe-discounts

Grabs current REWE discounts and saves them in a markdown file || Holt...

5   59   59  

webspot

An intelligent web service to automatically detect web content and ext...

9   59   59  

damai-tickets

大麦网抢票脚本案例

8   59   59  

pomp

Screen scraping and web crawling framework

10   59   59  

WebSpider

基于Nodejs,superagent,cheerio的在线web爬虫项目,支持生成API

20   59   59  

crawler-project

Google资深工程师深度讲解Go语言 爬虫项目。

29   59   59  

proxycrawl-python

ProxyCrawl Python library for scraping and crawling

19   59   59  

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-base...

11   59   59  

phpcrawl

Copy of http://phpcrawl.cuab.de/ for using with composer

33   58   58  

lyrics-crawler

Get the lyrics for the song currently playing on Spotify

19   58   58  

ipfs-crawler

A crawler for the IPFS network, code for our paper (https://arxiv.org/...

14   58   58  

local-api-examples

Easy-to-follow examples in Python, Node.js, and C# for web automation...

17   58   58  

Daily-code

日常代码爬虫、gui小工具等

5   57   57  

TumblTwo

TumblTwo, an Improved Fork of TumblOne, a Tumblr Downloader.

16   57   57  

slideshare-downloader

Python script to download slideshare pdf. This script able to download...

24   57   57  

SoFIFA

A SoFIFA webcrawler and Machine Learning prediction

13   57   57  

PicCrawler

使用RxJava2 和 Java 8的特性开发的图片爬虫

14   56   56  

m3u8Downloader

meijuba.net,Python crawler,M3U8格式视频下载,桌面应用

22   56   56  

devsearch

A web search engine built with Python which uses TF-IDF and PageRank t...

13   56   56  

SearchEngineScrapy

Scrape data from Google.com, Bing.com, Baidu.com, Ask.com, Yahoo.com,...

16   56   56