Most popular scraping repositories and open source projects

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python...

10050   47723   47723  

colly

Elegant Scraper and Crawler Framework for Golang

1617   19881   19881  

requests-html

Pythonic HTML Parsing for Humans™

963   13215   13215  

webmagic

A scalable web crawler framework for Java.

4138   10876   10876  

crawlee

Crawlee—A web scraping and browser automation library for Node.js that...

374   8610   8610  

tabula

Tabula is a tool for liberating data tables trapped inside PDF files

598   6047   6047  

curlconverter

Convert cURL commands to Python, JavaScript, Java, C#, PHP, Go, Dart,...

649   5964   5964  

awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing...

761   5870   5870  

undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation...

789   5600   5600  

ferret

Declarative web scraping

304   5408   5408  

headless-chrome-crawler

Distributed crawler powered by Headless Chrome

433   5384   5384  

autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

574   5278   5278  

mechanize

Mechanize is a ruby library that makes automated web interaction easy.

482   4310   4310  

Data-science

Collection of useful data science topics along with articles, videos,...

970   3748   3748  

fake-useragent

Up-to-date simple useragent faker with real world database

506   3082   3082  

Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power o...

509   2922   2922  

panther

A browser testing and web crawling library for PHP and Symfony

214   2712   2712  

GoogleScraper

A Python module to scrape several search engines (like Google, Yandex,...

761   2504   2504  

thal

Getting started with Puppeteer and Chrome Headless for Data Mining

230   2362   2362  

grab

Web Scraping Framework

278   2292   2292  

awesome-puppeteer

A curated list of awesome puppeteer resources.

147   2141   2141  

geziyor

Geziyor, blazing fast web crawling & scraping framework for Go. Suppor...

129   2130   2130  

Embed

Get info from any web service or page

302   1975   1975  

snoop

Snoop — инструмент разведки на основе открытых данных (OSINT world)

285   1862   1862  

facebook-scraper

Scrape Facebook public pages without an API key

500   1786   1786  

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scr...

130   1206   1206  

cloudproxy

Hide your scrapers IP behind the cloud. Provision proxy servers across...

53   1142   1142  

scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on de...

322   1117   1117  

django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

313   1116   1116  

artoo

artoo.js - the client-side scraping companion.

93   1091   1091  

mlscraper

🤖 Scrape data from HTML websites automatically by just providing examp...

67   1001   1001  

parsel

Parsel lets you extract data from XML/HTML documents using XPath or CS...

132   937   937  

instagram-scraper

Scrape the Instagram frontend. Inspired from twitter-scraper by @kenne...

79   923   923  

shot-scraper

A command-line utility for taking automated screenshots of websites

47   918   918  

oj

Tools for various online judges. Downloading sample cases, generating...

83   869   869  

querido-diario

📰 Brazilian government gazettes, accessible to everyone.

323   849   849  

clean-text

🧹 Python package for text cleaning

71   836   836  

DataEngineeringProject

Example end to end data engineering project.

169   824   824  

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

140   813   813  

till

DataHen Till is a companion tool to your existing web scraper that ins...

24   803   803  

iiab

Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspbe...

75   788   788  

Edu-Mail-Generator

Generate Free Edu Mail(s) within minutes

506   743   743  

crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

91   736   736  

easy-scraping-tutorial

Simple but useful Python web scraping tutorial code.

551   723   723  

ImageScraper

:scissors: High performance, multi-threaded image scraper

96   722   722  

kuwala

Kuwala is the no-code data platform for BI analysts and engineers enab...

48   718   718  

Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweet...

185   715   715  

gazpacho

🥫 The simple, fast, and modern web scraping library

58   701   701  

Katana

python script for Google Dorking

155   683   683  

loconotion

📄 Python tool to turn Notion.so pages into lightweight, customizable s...

114   669   669