Most popular crawler repositories and open source projects

pylinkvalidator

pylinkvalidator is a standalone and pure python link validator and cra...

37   128   128  

acm-statistics

An online tool (crawler) to analyze users performance in online judges...

13   128   128  

Sina-Weibo-Album-Downloader

Multithreading download all HD photos / pictures from someone's Sina W...

46   127   127  

docs

《数据采集从入门到放弃》源码。内容简介:爬虫介绍、就业情况、爬虫工程师...

27   127   127  

Ceiba-Downloader

This is a course-downloader to help NTU students download courses data...

10   127   127  

lumberjack

An automated website accessibility scanner and cli

7   126   126  

s3recon

Amazon S3 bucket finder and crawler.

51   125   125  

graphquery

GraphQuery is a query language and execution engine tied to any backen...

19   124   124  

PatentCrawler

scrapy专利爬虫(停止维护)

70   123   123  

smarter-encryption

13   123   123  

scraply

Scraply a simple dom scraper to fetch information from any html based...

11   123   123  

sentinel-crawler

Xenomorph Crawler, a Concise, Declarative and Observable Distributed C...

25   122   122  

instagram-profilecrawl

:computer: Quickly crawl the information (e.g. followers, tags, etc......

29   122   122  

aiotieba

百度贴吧吧务管理器✨删帖机✨使用aiohttp封装大量贴吧核心API

45   122   122  

GoogleImagesDownloader

Enlarge training dataset by searching images with specified keywords i...

66   121   121  

tir

Have time.ir in shell!

8   121   121  

memex-explorer

Viewers for statistics and dashboarding of Domain Search Engine data

69   120   120  

prerender-java

java framework for prerender

48   120   120  

google-news-scraper

Lightweight scraper for Google News

40   120   120  

findpapers

Findpapers: A tool for helping researchers who are looking for related...

21   120   120  

TiebaManager

(已跑路)百度贴吧吧务管理工具,自动扫描帖子并处理违规帖

43   119   119  

poopak

POOPAK - TOR Hidden Service Crawler

30   119   119  

BaiduCrawler

Sample of using proxies to crawl baidu search results.

63   118   118  

auto-lighthouse

A utility package for automating lighthouse reporting

18   117   117  

spidy

Domain names collector - Crawl websites and collect domain names along...

27   117   117  

eyes

Public Opinion Mining System of Taiwanese Forums

19   117   117  

andvaranaut

A dungeon crawler

11   116   116  

AmazonRobot

Amazon商品引流的 python 爬虫

45   116   116  

npm-search

🗿 npm ↔️ Algolia replication tool :skier: :snail: :artificial_satellit...

24   116   116  

Lcrawl

一只优雅的正方教务系统爬虫。

46   114   114  

blinkist-m4a-downloader

Grabs all of the audio files from all of the Blinkist books

21   114   114  

ungoliant

:spider: The pipeline for the OSCAR corpus

12   113   113  

linkcrawler

Cross-platform persistent and distributed web crawler :link:

9   112   112  

ThesaurusSpider

下载搜狗、百度、QQ输入法的词库文件的 python 爬虫,可用于构建不同行业的...

44   112   112  

APSoft-Web-Scanner-v2

Powerful dork searcher and vulnerability scanner for windows platform

35   112   112  

bee-university

Project thu thập điểm chuẩn đại học 2014 - 2018 và phân tích dữ liệu

24   111   111  

proxy-pool

爬虫代理IP池服务,可供其他爬虫程序通过restapi获取

55   110   110  

gflare-tk

Open-Source Python Based SEO Web Crawler

14   110   110  

WeiboCrawler

无cookie版微博爬虫,可以连续爬取一个或多个新浪微博用户信息、用户微博及...

19   110   110  

starfish-ql

✴️ An experimental graph database

4   110   110  

scrapy-puppeteer

Scrapy + Puppeteer

28   109   109  

crawler

爬虫, http代理, 模拟登陆!

46   108   108  

tracker-radar-collector

🕸 Modular, multithreaded, puppeteer-based crawler

40   108   108  

bose

✨ BOSE IS SWISS ARMY KNIFE 🔪 FOR BOT DEVELOPMENT. THE ULTIMATE BOT DEV...

1   107   107  

collector

Collect XSS vulnerable parameters from entire domain.

29   106   106  

facebook-data-extraction

Experiences in extracting data from Facebook with these 3 methods: Fac...

44   105   105  

CrawlerPack

Java 網路資料爬蟲包

69   104   104  

antispider

56   104   104  

crawler_detect

Ruby gem to detect bots and crawlers via the user agent

10   104   104  

WeiboSpider

微博爬虫,一个基于Scrapy框架的轻量微博爬虫,Sina Weibo Spider

22   104   104