Most popular scraping repositories and open source projects

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python...

10050   47723   47723  

colly

Elegant Scraper and Crawler Framework for Golang

1780   23786   23786  

requests-html

Pythonic HTML Parsing for Humans™

963   13215   13215  

webmagic

A scalable web crawler framework for Java.

4175   11507   11507  

crawlee

Crawlee—A web scraping and browser automation library for Node.js that...

374   8610   8610  

autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

696   6656   6656  

tabula

Tabula is a tool for liberating data tables trapped inside PDF files

598   6047   6047  

curlconverter

Convert cURL commands to Python, JavaScript, Java, C#, PHP, Go, Dart,...

649   5964   5964  

awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing...

761   5870   5870  

ferret

Declarative web scraping

303   5783   5783  

undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation...

789   5600   5600  

headless-chrome-crawler

Distributed crawler powered by Headless Chrome

433   5384   5384  

mechanize

Mechanize is a ruby library that makes automated web interaction easy.

472   4413   4413  

Data-science

Collection of useful data science topics along with articles, videos,...

1029   4076   4076  

fake-useragent

Up-to-date simple useragent faker with real world database

506   3082   3082  

Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power o...

509   2922   2922  

panther

A browser testing and web crawling library for PHP and Symfony

214   2712   2712  

geziyor

Geziyor, blazing fast web crawling & scraping framework for Go. Suppor...

150   2670   2670  

GoogleScraper

A Python module to scrape several search engines (like Google, Yandex,...

761   2504   2504  

awesome-puppeteer

A curated list of awesome puppeteer resources.

159   2443   2443  

thal

Getting started with Puppeteer and Chrome Headless for Data Mining

230   2362   2362  

grab

Web Scraping Framework

278   2292   2292  

Embed

Get info from any web service or page

302   1975   1975  

snoop

Snoop — инструмент разведки на основе открытых данных (OSINT world)

285   1862   1862  

facebook-scraper

Scrape Facebook public pages without an API key

500   1786   1786  

iiab

Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspbe...

82   1240   1240  

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scr...

130   1206   1206  

cloudproxy

Hide your scrapers IP behind the cloud. Provision proxy servers across...

53   1142   1142  

scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on de...

322   1117   1117  

django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

313   1116   1116  

artoo

artoo.js - the client-side scraping companion.

93   1091   1091  

crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

115   1009   1009  

mlscraper

🤖 Scrape data from HTML websites automatically by just providing exam...

67   1001   1001  

parsel

Parsel lets you extract data from XML/HTML documents using XPath or CS...

132   937   937  

instagram-scraper

Scrape the Instagram frontend. Inspired from twitter-scraper by @kenne...

79   923   923  

shot-scraper

A command-line utility for taking automated screenshots of websites

47   918   918  

oj

Tools for various online judges. Downloading sample cases, generating...

83   869   869  

querido-diario

📰 Brazilian government gazettes, accessible to everyone.

323   849   849  

linkedin

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker...

131   839   839  

clean-text

🧹 Python package for text cleaning

71   836   836  

DataEngineeringProject

Example end to end data engineering project.

169   824   824  

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

140   813   813  

till

DataHen Till is a companion tool to your existing web scraper that ins...

24   803   803  

Edu-Mail-Generator

Generate Free Edu Mail(s) within minutes

506   743   743  

easy-scraping-tutorial

Simple but useful Python web scraping tutorial code.

551   723   723  

ImageScraper

:scissors: High performance, multi-threaded image scraper

96   722   722  

kuwala

Kuwala is the no-code data platform for BI analysts and engineers enab...

48   718   718  

Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweet...

185   715   715  

gazpacho

🥫 The simple, fast, and modern web scraping library

58   701   701  

Katana

python script for Google Dorking

155   683   683