Most popular crawling repositories and open source projects

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python...

10050   47723   47723  

colly

Elegant Scraper and Crawler Framework for Golang

1617   19881   19881  

newspaper

News, full-text, and article metadata extraction in Python 3. Advanced...

2044   12913   12913  

crawlee

Crawlee—A web scraping and browser automation library for Node.js that...

374   8610   8610  

awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing...

761   5870   5870  

ferret

Declarative web scraping

304   5408   5408  

headless-chrome-crawler

Distributed crawler powered by Headless Chrome

433   5384   5384  

rod

A Devtools driver for web automation and scraping

275   3936   3936  

hakrawler

Simple, fast web crawler designed for easy, quick discovery of endpoin...

427   3584   3584  

nutch

Apache Nutch is an extensible and scalable web crawler

1208   2546   2546  

grab

Web Scraping Framework

278   2292   2292  

awesome-puppeteer

A curated list of awesome puppeteer resources.

147   2141   2141  

skycaiji

蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运...

556   1694   1694  

core

The complete web scraping toolkit for PHP.

53   1137   1137  

mlscraper

🤖 Scrape data from HTML websites automatically by just providing examp...

67   1001   1001  

cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, ap...

111   949   949  

bhban_rpa

<6개월 치 업무를 하루 만에 끝내는 업무 자동화(생능출판사, 2020)>의 예...

652   830   830  

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

140   813   813  

scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

266   792   792  

scrapyrt

HTTP API for Scrapy spiders

157   775   775  

crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

91   736   736  

easy-scraping-tutorial

Simple but useful Python web scraping tutorial code.

551   723   723  

holiday-cn

📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告

92   686   686  

dataflowkit

Extract structured data from web sites. Web sites scraping.

74   577   577  

isp-data-pollution

ISP Data Pollution to Protect Private Browsing History with Obfuscatio...

52   531   531  

crawljax

Crawljax

227   493   493  

webster

a reliable high-level web crawling & scraping framework for Node.js.

57   457   457  

spidermon

Scrapy Extension for monitoring spiders execution.

86   455   455  

AdminHack

today we will hack the admin panel of the site.

72   405   405  

WarcDB

WarcDB: Web crawl data as SQLite databases.

10   369   369  

linkedin-profile-scraper

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in J...

114   346   346  

second-order

Second-order subdomain takeover scanner

65   328   328  

spidy

The simple, easy to use command line web crawler.

66   311   311  

stopstalk-deployment

Stop stalking and start StopStalking :wink:

95   303   303  

gopa

GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://inde...

82   296   296  

Sasila

一个灵活、友好的爬虫框架

76   293   293  

memorious

Lightweight web scraping toolkit for documents and structured data.

53   287   287  

Instagram-Bot

An Instagram bot developed using the Selenium Framework

88   270   270  

antch

Antch, a fast, powerful and extensible web crawling & scraping framewo...

42   250   250  

laravel

Laravel adapter for Roach, the complete web scraping toolkit for PHP.

12   218   218  

crawler

Library for Rapid (Web) Crawler and Scraper Development

6   207   207  

N2H4

네이버 뉴스 수집을 위한 도구

74   205   205  

Grawler

Grawler is a tool written in PHP which comes with a web interface that...

57   188   188  

corpuscrawler

Crawler for linguistic corpora

56   164   164  

Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that u...

28   156   156  

massivedl

Download a large list of files concurrently

11   154   154  

DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying...

54   153   153  

crawler

Go process used to crawl websites

20   149   149  

cdp4j

cdp4j - Chrome DevTools Protocol for Java

43   144   144  

telegram-crawler

🕷 Automatically detect changes made to the official Telegram sites, cl...

16   143   143