Most popular crawling repositories and open source projects

Sasila

一个灵活、友好的爬虫框架

69   296   296  

Instagram-Bot

An Instagram bot developed using the Selenium Framework

84   281   281  

scrapper

Web scraper with a simple REST API living in Docker and using a Headle...

41   266   266  

antch

Antch, a fast, powerful and extensible web crawling & scraping framewo...

41   263   263  

laravel

Laravel adapter for Roach, the complete web scraping toolkit for PHP.

25   259   259  

Infect

Create you virus in termux!

24   231   231  

N2H4

네이버 뉴스 수집을 위한 도구

75   217   217  

Grawler

Grawler is a tool written in PHP which comes with a web interface that...

55   213   213  

corpuscrawler

Crawler for linguistic corpora

55   205   205  

facebook-data-extraction

Experience for effectively fetching Facebook data by Querying Graph AP...

61   205   205  

estela

estela, an elastic web scraping cluster 🕸

15   185   185  

SpideyX

SpideyX a multipurpose Web Penetration Testing tool with asynchronous...

32   178   178  

DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying...

66   176   176  

Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that u...

25   170   170  

massivedl

Download a large list of files concurrently

11   161   161  

crawler

Go process used to crawl websites

21   150   150  

cdp4j

cdp4j - Chrome DevTools Protocol for Java

43   144   144  

sasori

Sasori is a dynamic web crawler powered by Puppeteer, designed for lig...

16   144   144  

courlan

Clean, filter and sample URLs to optimize data collection – Python & c...

9   142   142  

double-agent

A test suite of common scraper detection techniques. See how detectabl...

10   140   140  

scraply

Scraply a simple dom scraper to fetch information from any html based...

13   129   129  

proxifier

A fast, modern and intelligent proxy rotator perfect for crawling and...

16   128   128  

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC...

17   125   125  

aioscpy

An asyncio + aiolibs crawler imitate scrapy framework

10   124   124  

pdf-crawler

SimFin's open source PDF crawler

44   124   124  

sitemapper

Parse through any sitemap in Node.js

77   122   122  

LinkedIn-Skills-Crawler

A simple Python script to crawl complete list of LinkedIn skills

111   121   121  

bots-zoo

26   115   115  

jkcrawler

使用 Scrapy 写成的 JK 爬虫,图片源自哔哩哔哩、Tumblr、Instagram,以及...

28   114   114  

warc-parquet

🗄️ A simple CLI for converting WARC to Parquet.

0   112   112  

burp-dom-scanner

Burp Suite's extension to scan and crawl Single Page Applications

16   105   105  

dig-etl-engine

Download DIG to run on your laptop or server.

37   103   103  

devdocs-to-llm

Turn any developer documentation into a GPT

14   98   98  

bathyscaphe

Fast, highly configurable, cloud native dark web crawler.

21   93   93  

ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the...

25   88   88  

robots.txt

Simple robots.txt template. Keep unwanted robots out (disallow). White...

38   86   86  

spidercreator

Automated web scraping spider generation using Browser Use and LLMs. S...

11   79   79  

abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (lik...

4   78   78  

arachnid

Powerful web scraping framework for Crystal

12   78   78  

feedsearch-crawler

Crawl sites for RSS, Atom, and JSON feeds.

10   77   77  

goClone

🌱 goClone - clone websites in seconds

4   76   76  

tech-seo-crawler

Build a small, 3 domain internet using Github pages and Wikipedia and...

11   74   74  

Harvester

Web crawling and document processing through a usable interface.

15   72   72  

Python-Crawling-Tutorial

Python crawling tutorial

25   62   62  

crawling-projects

Web scraping and automation using python

16   62   62  

datacrawl

A simple and easy to use web crawler for Python

11   62   62  

custom-crawler

🌌 High productivity semi-automatic crawler generator 🛠️🧰

4   60   60  

proxycrawl-python

ProxyCrawl Python library for scraping and crawling

19   59   59  

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-base...

11   59   59  

pomp

Screen scraping and web crawling framework

10   59   59