Most popular crawling repositories and open source projects

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python...

10735   54836   54836  

colly

Elegant Scraper and Crawler Framework for Golang

1786   24013   24013  

crawlee

Crawlee—A web scraping and browser automation library for Node.js to b...

790   17386   17386  

newspaper

News, full-text, and article metadata extraction in Python 3. Advanced...

2044   12913   12913  

awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing...

807   6966   6966  

ferret

Declarative web scraping

303   5798   5798  

rod

A Chrome DevTools Protocol driver for web automation and scraping.

368   5754   5754  

headless-chrome-crawler

Distributed crawler powered by Headless Chrome

408   5562   5562  

hakrawler

Simple, fast web crawler designed for easy, quick discovery of endpoin...

427   3584   3584  

nutch

Apache Nutch is an extensible and scalable web crawler

1251   2979   2979  

awesome-puppeteer

A curated list of awesome puppeteer resources.

159   2460   2460  

grab

Web Scraping Framework

274   2403   2403  

skycaiji

蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运...

556   1694   1694  

cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, ap...

167   1621   1621  

mlscraper

🤖 Scrape data from HTML websites automatically by just providing exam...

90   1349   1349  

core

The complete web scraping toolkit for PHP.

53   1137   1137  

crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

117   1019   1019  

bhban_rpa

<6개월 치 업무를 하루 만에 끝내는 업무 자동화(생능출판사, 2020)>의 예...

652   830   830  

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

141   812   812  

easy-scraping-tutorial

Simple but useful Python web scraping tutorial code.

548   800   800  

scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

266   792   792  

scrapyrt

HTTP API for Scrapy spiders

157   775   775  

holiday-cn

📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告

92   686   686  

dataflowkit

Extract structured data from web sites. Web sites scraping.

80   681   681  

linkedin-profile-scraper-api

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in J...

162   621   621  

isp-data-pollution

ISP Data Pollution to Protect Private Browsing History with Obfuscatio...

52   599   599  

spidermon

Scrapy Extension for monitoring spiders execution.

100   541   541  

crawljax

Crawljax

227   493   493  

webster

a reliable high-level web crawling & scraping framework for Node.js.

57   457   457  

AdminHack

today we will hack the admin panel of the site.

72   405   405  

WarcDB

WarcDB: Web crawl data as SQLite databases.

10   369   369  

crawler

Library for Rapid (Web) Crawler and Scraper Development

13   360   360  

second-order

Second-order subdomain takeover scanner

65   328   328  

spidy

The simple, easy to use command line web crawler.

66   311   311  

memorious

Lightweight web scraping toolkit for documents and structured data.

62   311   311  

crawler

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)

82   308   308  

stopstalk-deployment

Stop stalking and start StopStalking :wink:

95   303   303  

Sasila

一个灵活、友好的爬虫框架

70   297   297  

Instagram-Bot

An Instagram bot developed using the Selenium Framework

84   280   280  

antch

Antch, a fast, powerful and extensible web crawling & scraping framewo...

41   262   262  

scrapper

Web scraper with a simple REST API living in Docker and using a Headle...

37   243   243  

laravel

Laravel adapter for Roach, the complete web scraping toolkit for PHP.

12   218   218  

Grawler

Grawler is a tool written in PHP which comes with a web interface that...

55   214   214  

N2H4

네이버 뉴스 수집을 위한 도구

74   205   205  

estela

estela, an elastic web scraping cluster 🕸

15   180   180  

DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying...

66   176   176  

corpuscrawler

Crawler for linguistic corpora

56   164   164  

Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that u...

28   156   156  

massivedl

Download a large list of files concurrently

11   154   154  

crawler

Go process used to crawl websites

20   149   149