Most popular crawler repositories and open source projects

Lcrawl

一只优雅的正方教务系统爬虫。

46   114   114  

ungoliant

:spider: The pipeline for the OSCAR corpus

12   113   113  

linkcrawler

Cross-platform persistent and distributed web crawler :link:

9   112   112  

ThesaurusSpider

下载搜狗、百度、QQ输入法的词库文件的 python 爬虫,可用于构建不同行业的...

44   112   112  

APSoft-Web-Scanner-v2

Powerful dork searcher and vulnerability scanner for windows platform

35   112   112  

bee-university

Project thu thập điểm chuẩn đại học 2014 - 2018 và phân tích dữ liệu

24   111   111  

scrapy-puppeteer

Scrapy + Puppeteer

29   111   111  

starfish-ql

✴️ An experimental graph database

4   110   110  

proxy-pool

爬虫代理IP池服务,可供其他爬虫程序通过restapi获取

55   110   110  

gflare-tk

Open-Source Python Based SEO Web Crawler

14   110   110  

WeiboCrawler

无cookie版微博爬虫,可以连续爬取一个或多个新浪微博用户信息、用户微博及...

19   110   110  

crawler

爬虫, http代理, 模拟登陆!

46   108   108  

zyte-smartproxy-headless-proxy

A complimentary proxy to help to use SPM with headless browsers

37   108   108  

tracker-radar-collector

🕸 Modular, multithreaded, puppeteer-based crawler

40   108   108  

bose

✨ BOSE IS SWISS ARMY KNIFE 🔪 FOR BOT DEVELOPMENT. THE ULTIMATE BOT D...

1   107   107  

collector

Collect XSS vulnerable parameters from entire domain.

29   106   106  

images-web-crawler

This package is a complete tool for creating a large dataset of images...

24   105   105  

CrawlerPack

Java 網路資料爬蟲包

69   104   104  

antispider

56   104   104  

crawler_detect

Ruby gem to detect bots and crawlers via the user agent

10   104   104  

WeiboSpider

微博爬虫,一个基于Scrapy框架的轻量微博爬虫,Sina Weibo Spider

22   104   104  

4scanner

Continuously search imageboards threads for images/webms and download...

18   103   103  

webb

Python: An all-in-one Web Crawler, Web Parser and Web Scrapping librar...

41   102   102  

PHPCreeper

A new generation of multi-process asynchronous event-driven spider eng...

14   102   102  

Scrapy_IPProxyPool

免费 IP 代理池。Scrapy 爬虫框架插件

40   101   101  

Weibo-Album-Crawler

A multiprocessing crawler for weibo albums.

34   99   99  

goscraper

Golang pkg to quickly return a preview of a webpage (title/description...

40   99   99  

pappet

A command-line tool to crawl websites using puppeteer.

8   98   98  

LinkedIn-Scraper

A LinkedIn Scraper to scrape up to 10k LinkedIn profiles from company...

38   98   98  

google-maps-scraper

👋 HOLA! ENJOY OUR GOOGLE MAPS SCRAPER 🚀 TO EFFORTLESSLY EXTRACT DATA...

14   98   98  

asyncpy

使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架

23   97   97  

scaleable-crawler-with-docker-cluster

a scaleable and efficient crawelr with docker cluster , crawl million...

27   97   97  

copymanga-downloader

使用python编译exe/bash/命令行参数来下载copymanga(拷贝漫画)中的漫画,支...

9   96   96  

Taiwan-news-crawlers

Scrapy-based Crawlers for news of Taiwan

17   95   95  

google-arts-crawler

Google Arts & Culture high quality image downloader

16   95   95  

gopa-abandoned

GOPA, a spider written in Go.(NOTE: this project moved to https://git...

30   94   94  

bathyscaphe

Fast, highly configurable, cloud native dark web crawler.

21   94   94  

MetaFinder

Search for documents in a domain through Search Engines (Google, Bing...

20   94   94  

dcard-spider

A spider on Dcard. Strong and speedy.

20   93   93  

aliexscrape

Get Aliexpress product details in JSON

30   93   93  

SpotifyScraper

Spotify Scraper to extract all the information from spotify, download...

8   91   91  

slrp

rotating open proxy multiplexer

9   91   91  

crawlie

A simple Elixir library for writing decently-performing crawlers with...

11   89   89  

news-crawler

A news crawler for BBC News, Reuters and New York Times.

38   89   89  

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler dev...

15   89   89  

Price-monitor

某东商品价格监控:自定义商品价格,降价邮件/微信提醒。技术:Python爬虫/...

43   89   89  

shopify-spy

Extract structured data from Shopify websites.

47   88   88  

movie-elasticsearch

使用 SpringBoot2.0+ElasticSearch 实现的开源电影搜索引擎

35   87   87  

es6-crawler-detect

:spider: This is an ES6 adaptation of the original PHP library Crawler...

28   86   86  

html-table-extractor

extract data from html table

22   86   86