Most popular scraping repositories and open source projects

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python...

10735   54836   54836  

firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Sc...

3034   34681   34681  

Jobs_Applier_AI_Agent_AIHawk

AIHawk aims to easy job hunt process by automating the job application...

4166   27886   27886  

colly

Elegant Scraper and Crawler Framework for Golang

1786   24013   24013  

Scrapegraph-ai

Python scraper based on AI

1608   19020   19020  

crawlee

Crawlee—A web scraping and browser automation library for Node.js to b...

790   17386   17386  

maigret

🕵️‍♂️ Collect a dossier on a person by username from thousands of site...

1031   15083   15083  

requests-html

Pythonic HTML Parsing for Humans™

987   13806   13806  

webmagic

A scalable web crawler framework for Java.

4173   11533   11533  

undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation...

1215   10974   10974  

tabula

Tabula is a tool for liberating data tables trapped inside PDF files

659   6977   6977  

awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing...

807   6966   6966  

autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

702   6716   6716  

curlconverter

Convert cURL commands to Python, JavaScript, Java, C#, PHP, Go, Dart,...

649   5964   5964  

ferret

Declarative web scraping

303   5798   5798  

headless-chrome-crawler

Distributed crawler powered by Headless Chrome

408   5562   5562  

crawlee-python

Crawlee—A web scraping and browser automation library for Python to bu...

370   5496   5496  

mechanize

Mechanize is a ruby library that makes automated web interaction easy.

473   4415   4415  

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Cra...

288   4109   4109  

Data-science

Collection of useful data science topics along with articles, videos,...

1030   4086   4086  

fake-useragent

Up-to-date simple useragent faker with real world database

520   3842   3842  

pipet

Swiss-army tool for scraping and extracting data from online assets, m...

116   3434   3434  

snoop

Snoop — инструмент разведки на основе открытых данных (OSINT world)

371   3302   3302  

Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power o...

560   3199   3199  

panther

A browser testing and web crawling library for PHP and Symfony

229   2988   2988  

Scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python librar...

189   2862   2862  

geziyor

Geziyor, blazing fast web crawling & scraping framework for Go. Suppor...

152   2687   2687  

GoogleScraper

A Python module to scrape several search engines (like Google, Yandex,...

745   2683   2683  

facebook-scraper

Scrape Facebook public pages without an API key

648   2614   2614  

awesome-puppeteer

A curated list of awesome puppeteer resources.

159   2460   2460  

twikit

Twitter API Scraper | Without an API key | Twitter Internal API | Free...

291   2437   2437  

grab

Web Scraping Framework

274   2403   2403  

thal

Getting started with Puppeteer and Chrome Headless for Web Scraping

206   2358   2358  

Embed

Get info from any web service or page

312   2108   2108  

shot-scraper

A command-line utility for taking automated screenshots of websites

92   1906   1906  

camoufox

🦊 Anti-detect browser

138   1630   1630  

spider

A web crawler and scraper for Rust

135   1611   1611  

cloudproxy

Hide your scrapers IP behind the cloud. Provision proxy servers across...

83   1444   1444  

mlscraper

🤖 Scrape data from HTML websites automatically by just providing exam...

90   1349   1349  

fingerprint-suite

Browser fingerprinting tools for anonymizing your scrapers. Developed...

128   1301   1301  

iiab

Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspbe...

90   1262   1262  

DataEngineeringProject

Example end to end data engineering project.

250   1255   1255  

parsel

Parsel lets you extract data from XML/HTML documents using XPath or CS...

151   1214   1214  

scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on de...

323   1199   1199  

querido-diario

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazil...

417   1168   1168  

django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

311   1158   1158  

Smartproxy

HTTP(S)/SOCKS5 rotating residential proxies - code examples & general...

42   1120   1120  

Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweet...

233   1119   1119  

artoo

artoo.js - the client-side scraping companion.

93   1110   1110  

oj

Tools for various online judges. Downloading sample cases, generating...

95   1074   1074