Most popular crawling repositories and open source projects

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python...

10050   47723   47723  

colly

Elegant Scraper and Crawler Framework for Golang

1780   23786   23786  

newspaper

News, full-text, and article metadata extraction in Python 3. Advanced...

2044   12913   12913  

crawlee

Crawlee—A web scraping and browser automation library for Node.js that...

374   8610   8610  

awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing...

761   5870   5870  

ferret

Declarative web scraping

303   5783   5783  

rod

A Chrome DevTools Protocol driver for web automation and scraping.

368   5754   5754  

headless-chrome-crawler

Distributed crawler powered by Headless Chrome

433   5384   5384  

hakrawler

Simple, fast web crawler designed for easy, quick discovery of endpoin...

427   3584   3584  

nutch

Apache Nutch is an extensible and scalable web crawler

1251   2979   2979  

awesome-puppeteer

A curated list of awesome puppeteer resources.

159   2443   2443  

grab

Web Scraping Framework

278   2292   2292  

skycaiji

蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运...

556   1694   1694  

cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, ap...

167   1621   1621  

core

The complete web scraping toolkit for PHP.

53   1137   1137  

crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

115   1009   1009  

mlscraper

🤖 Scrape data from HTML websites automatically by just providing exam...

67   1001   1001  

bhban_rpa

<6개월 치 업무를 하루 만에 끝내는 업무 자동화(생능출판사, 2020)>의 예...

652   830   830  

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

140   813   813  

scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

266   792   792  

scrapyrt

HTTP API for Scrapy spiders

157   775   775  

easy-scraping-tutorial

Simple but useful Python web scraping tutorial code.

551   723   723  

holiday-cn

📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告

92   686   686  

isp-data-pollution

ISP Data Pollution to Protect Private Browsing History with Obfuscatio...

52   599   599  

dataflowkit

Extract structured data from web sites. Web sites scraping.

74   577   577  

crawljax

Crawljax

227   493   493  

webster

a reliable high-level web crawling & scraping framework for Node.js.

57   457   457  

spidermon

Scrapy Extension for monitoring spiders execution.

86   455   455  

AdminHack

today we will hack the admin panel of the site.

72   405   405  

WarcDB

WarcDB: Web crawl data as SQLite databases.

10   369   369  

linkedin-profile-scraper

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in J...

114   346   346  

second-order

Second-order subdomain takeover scanner

65   328   328  

spidy

The simple, easy to use command line web crawler.

66   311   311  

stopstalk-deployment

Stop stalking and start StopStalking :wink:

95   303   303  

gopa

GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://inde...

82   296   296  

Sasila

一个灵活、友好的爬虫框架

76   293   293  

memorious

Lightweight web scraping toolkit for documents and structured data.

53   287   287  

Instagram-Bot

An Instagram bot developed using the Selenium Framework

84   280   280  

antch

Antch, a fast, powerful and extensible web crawling & scraping framewo...

42   250   250  

laravel

Laravel adapter for Roach, the complete web scraping toolkit for PHP.

12   218   218  

crawler

Library for Rapid (Web) Crawler and Scraper Development

6   207   207  

N2H4

네이버 뉴스 수집을 위한 도구

74   205   205  

Grawler

Grawler is a tool written in PHP which comes with a web interface that...

57   188   188  

corpuscrawler

Crawler for linguistic corpora

56   164   164  

Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that u...

28   156   156  

massivedl

Download a large list of files concurrently

11   154   154  

DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying...

54   153   153  

crawler

Go process used to crawl websites

20   149   149  

cdp4j

cdp4j - Chrome DevTools Protocol for Java

43   144   144  

telegram-crawler

🕷 Automatically detect changes made to the official Telegram sites, cl...

16   143   143