Most popular scraping repositories and open source projects

iiab iiab Jinja

Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspberry Pi !

1.9k 138 26

untidetect-tools TheGP

List of anti-detect and humanizing tools and browsers, including captcha solvers and sms-activation.

1.9k 157 35

invisible_playwright feder-cr Python

Anti-Detect Browser that passes every bot detection test. Drop-in Playwright replacement.

1.8k 196 16

cloudproxy claffin Python

Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.

1.7k 108 16

Scweet Altimis Python

Scrape tweets, profiles, followers and following from Twitter/X, no API key needed. Python library with smart multi-account pooling, proxy support and...

1.6k 276 18

thepipe emcf Python

Get clean data from tricky documents, powered by vision-language models ⚡

1.5k 99 14

scrape-google-python oxylabs

In this tutorial, we showcase how to scrape public Google data with Python and Oxylabs API.

1.5k 3 1

api.consumet.org consumet TypeScript

A Modern Search Engine API for Anime, Movies/TVShows, Books, Light Novels, Manga, etc.

1.5k 722 1.5k

agentql tinyfish-io Python

AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements an...

1.4k 163 19

DataEngineeringProject damklis Python

Example end to end data engineering project.

1.4k 278 17

wreq-python 0x676e67 Rust

An ergonomic, privacy-aware Python HTTP Client

1.4k 113 23

rebrowser-patches rebrowser JavaScript

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy...

1.4k 78 28

mlscraper lorey Python

🤖 Scrape data from HTML websites automatically by just providing examples

1.4k 93 15

querido-diario okfn-brasil Python

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

1.4k 457 67

parsel scrapy Python

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

1.3k 165 34

Auto-Gmail-Creator ai-to-ai Python

Open Source Bulk Auto Gmail Creator Bot with Selenium & Seleniumwire ( Python ). Feel free to contact me with Django/Flask, ML, AI, GPT, Automation, S...

1.3k 740 27

parsera raznem Python

Lightweight library for scraping web-sites with LLMs

1.3k 73 19

mov-cli mov-cli Python

Watch everything from your terminal.

1.3k 63 11

scrapy-cluster istresearch Python

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

1.2k 320 103

Decodo Decodo Java

HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.

1.2k 56 21

browserforge daijro Python

🎭 Intelligent browser header & fingerprint generator

1.2k 84 19

oj online-judge-tools Python

Tools for various online judges. Downloading sample cases, generating additional test cases, testing your code, and submitting it.

1.2k 111 17

websurfx neon-mmd Rust

:rocket: An open source alternative to searx which provides a modern-looking :sparkles:, lightning-fast :zap:, privacy respecting :disguised_face:, se...

1.2k 127 7

django-dynamic-scraper holgerd77 Python

Creating Scrapy scrapers via the Django admin interface

1.2k 302 73

newspaper4k AndyTheFactory Python

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

1.1k 113 11

OF-Scraper datawhores Python

A completely revamped and redesigned fork, reimagined from scratch based on the original onlyfans-scraper

1.1k 105 25

artoo medialab JavaScript

artoo.js - the client-side scraping companion.

1.1k 93 45

crawly elixir-crawly Elixir

Crawly, a high-level web crawling & scraping framework for Elixir.

1.1k 123 17

rama plabayo Rust

modular service framework to move and transform network packets

1.1k 99 9

auto-archiver bellingcat Python

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

1.1k 107 23

viu viu-media Python

Your browser anime experience from the terminal

1.1k 84 6

socid-extractor soxoj Python

⛏️ The extraction engine behind Maigret: turn any profile URL into a structured OSINT record across 150+ sites

1k 115 25

scrapfly-scrapers scrapfly Python

Scalable Python web scraping scripts for +40 popular domains

1k 202 19

clean-text jfilter Python

🧹 Python package for text cleaning

1k 83 12

hrequests daijro Python

🚀 Web scraping for humans

1k 67 13

instagram-scraper meetmangukiya Python

Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.

945 79 1

reverse-api-engineer kalil0321 Python

The agent that turns websites into APIs!

905 83 10

Edu-Mail-Generator 0xjas0 Python

Generate Free Edu Mail(s) within minutes

868 398 40

loconotion leoncvlt Python

📄 Python tool to turn Notion.so pages into lightweight, customizable static websites

854 136 16

pdf.tocgen Krasjet Python

A CLI toolset to generate table of contents for PDF files automatically.

841 30 6

easy-scraping-tutorial MorvanZhou Jupyter Notebook

Simple but useful Python web scraping tutorial code.

818 542 40

till DataHenHQ Go

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code...

814 23 4

kuwala kuwala-io JavaScript

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of...

807 54 12