Most popular crawler repositories and open source projects

spider

:star2::octocat: powered by python3( simple learning of spider) 百度文...

65   136   136  

not-your-average-web-crawler

A web crawler (for bug hunting) that gathers more than you can imagine...

36   136   136  

CrawlBox

Easy way to brute-force web directory.

42   136   136  

GoodreadsScraper

Scrape data from Goodreads using Scrapy and Selenium :books:

36   136   136  

Zhihu-Spider

一个获取知乎用户主页信息的多线程Python爬虫程序。

54   135   135  

site-audit-seo

Web service and CLI tool for SEO site audit: crawl site, lighthouse al...

18   134   134  

Skill-Share-Crawler---DL

Download Videos Skill Share per ID or per Class

35   133   133  

leetcode-ranking-search

Leetcode Contest Ranking Searcher

22   133   133  

dyer

Dyer is designed for reliable, flexible and fast web crawling, providi...

14   131   131  

onegram

This repository is no longer maintained.

5   130   130  

php-crawler

A php crawler that finds emails on the internets

65   130   130  

pricetrack

Price tracker monitors of products and alerts you when prices drop. Su...

49   129   129  

picacomic_downloader

哔咔漫画收藏夹下载程序

17   129   129  

pylinkvalidator

pylinkvalidator is a standalone and pure python link validator and cra...

37   128   128  

acm-statistics

An online tool (crawler) to analyze users performance in online judges...

13   128   128  

Sina-Weibo-Album-Downloader

Multithreading download all HD photos / pictures from someone's Sina W...

46   127   127  

docs

《数据采集从入门到放弃》源码。内容简介:爬虫介绍、就业情况、爬虫工程师...

27   127   127  

Ceiba-Downloader

This is a course-downloader to help NTU students download courses data...

10   127   127  

lumberjack

An automated website accessibility scanner and cli

7   126   126  

proxifier

A fast, modern and intelligent proxy rotator perfect for crawling and...

16   126   126  

s3recon

Amazon S3 bucket finder and crawler.

51   125   125  

sentinel-crawler

Xenomorph Crawler, a Concise, Declarative and Observable Distributed C...

26   124   124  

graphquery

GraphQuery is a query language and execution engine tied to any backen...

19   124   124  

PatentCrawler

scrapy专利爬虫(停止维护)

70   123   123  

smarter-encryption

13   123   123  

scraply

Scraply a simple dom scraper to fetch information from any html based...

11   123   123  

aiotieba

百度贴吧吧务管理器✨删帖机✨使用aiohttp封装大量贴吧核心API

45   122   122  

instagram-profilecrawl

:computer: Quickly crawl the information (e.g. followers, tags, etc......

29   122   122  

GoogleImagesDownloader

Enlarge training dataset by searching images with specified keywords i...

66   121   121  

memex-explorer

Viewers for statistics and dashboarding of Domain Search Engine data

69   120   120  

prerender-java

java framework for prerender

48   120   120  

google-news-scraper

Lightweight scraper for Google News

40   120   120  

findpapers

Findpapers: A tool for helping researchers who are looking for related...

21   120   120  

TiebaManager

(已跑路)百度贴吧吧务管理工具,自动扫描帖子并处理违规帖

43   119   119  

poopak

POOPAK - TOR Hidden Service Crawler

30   119   119  

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC...

16   119   119  

eyes

Public Opinion Mining System of Taiwanese Forums

18   119   119  

WebReaper

Web scraper, crawler and parser in C#. Designed as simple, declarative...

28   119   119  

BaiduCrawler

Sample of using proxies to crawl baidu search results.

63   118   118  

auto-lighthouse

A utility package for automating lighthouse reporting

18   117   117  

spidy

Domain names collector - Crawl websites and collect domain names along...

27   117   117  

andvaranaut

A dungeon crawler

11   116   116  

AmazonRobot

Amazon商品引流的 python 爬虫

45   116   116  

npm-search

🗿 npm ↔️ Algolia replication tool :skier: :snail: :artificial_satelli...

24   116   116  

Lcrawl

一只优雅的正方教务系统爬虫。

46   114   114  

blinkist-m4a-downloader

Grabs all of the audio files from all of the Blinkist books

21   114   114  

bots-zoo

26   113   113  

ungoliant

:spider: The pipeline for the OSCAR corpus

12   113   113  

linkcrawler

Cross-platform persistent and distributed web crawler :link:

9   112   112  

ThesaurusSpider

下载搜狗、百度、QQ输入法的词库文件的 python 爬虫,可用于构建不同行业的...

44   112   112