Most popular scraping repositories and open source projects

Pasta

A PasteBin scrapper that doesnt rely on the PasteBin scrape API

6   66   66  

foundation

🧱 A uniform template to use as a foundation for Puppeteer bot constru...

7   66   66  

medium-crawler

A crawler for scraping posts from medium.com

15   65   65  

selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract d...

11   65   65  

Google-Patents-Scraper

Automatically download all PDF files of searching results & their pate...

22   65   65  

maps-to-lead

Esse projeto tem como objetivo obter leads em formato JSON e enviar pa...

13   65   65  

dom_query

A Flexible Rust Crate for DOM Querying and Manipulation

6   64   64  

DexScreener-Scraping

When a specific token pair from DEX Screener is given, this script wil...

32   64   64  

Pinterest-infinite-crawler

An infinite Pinterest crawler/scraper. Crawl image with inifnite-scrol...

11   64   64  

rubium

Rubium is a lightweight alternative to Selenium/Capybara/Watir if you...

0   64   64  

PyLex

Perform lexical analysis on words, one word at a time.

2   64   64  

Tor_Spider

Python project to crawl and scrap the lesser known deep web or one can...

16   64   64  

worldometer

Get live, population, geography, projected, and historical data from a...

10   64   64  

daenerys

Scraping and Web Crawling Framework For Zhihu Live

30   63   63  

pythonista-chromeless

Serverless selenium which dynamically execute any given code.

10   63   63  

angel.co-companies-list-scraping

35   62   62  

rebrowser-playwright-python

A drop-in replacement for playwright-python patched with rebrowser-pat...

6   62   62  

undetected_geckodriver

A custom Firefox Selenium-based Webdriver. Passes all bot mitigation s...

7   62   62  

datasette-scraper

Add website scraping abilities to Datasette

1   62   62  

Porn-Novel-Scraper

A script that can be used to capture various porn novels for machine l...

12   61   61  

datacrawl

A simple and easy to use web crawler for Python

12   61   61  

ksoup

Kotlin Wrapper for Jsoup

6   61   61  

pycaching

A Python 3 interface for working with Geocaching.com website.

46   61   61  

conformist

Bend CSVs to your will with declarative schemas.

6   60   60  

justetf-scraping

Scraping the justETF

18   60   60  

apify-client-python

Apify API client for Python

12   60   60  

webforai

The best HTML to Markdown library, A esm-native & Useful Utilities wit...

5   59   59  

pomp

Screen scraping and web crawling framework

10   59   59  

playlist2links

This bash script allows to extract video links from a youtube playlist

10   59   59  

proxycrawl-python

ProxyCrawl Python library for scraping and crawling

19   59   59  

PythonScrapyBasicSetup

Basic setup with random user agents and IP addresses for Python Scrapy...

14   58   58  

whatsapp-tracking

Scraping the status of WhatsApp contacts

11   58   58  

coches-net-dashboard

Sample project that use Dagster, dbt, DuckDB and Dash to visualize car...

4   58   58  

Pahe.ph-Scraper

Pahe.ph [Pahe.in] Movies Website Scraper

15   58   58  

local-api-examples

Easy-to-follow examples in Python, Node.js, and C# for web automation...

17   58   58  

web_scraping_freecodecamp

Curso de web scraping con Python creado por Gustavo Juantorena para fr...

19   57   57  

sample-web-scraping-with-electron

Sample project for web scraping with Electron

17   57   57  

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-base...

11   57   57  

SearchEngineScrapy

Scrape data from Google.com, Bing.com, Baidu.com, Ask.com, Yahoo.com,...

16   56   56  

actor-facebook-scraper

Scrape public Facebook pages, posts, reviews and comments

32   56   56  

ogpParser

Open Graph Protocol Parser for Node.js

12   56   56  

actor-whitepaper

This whitepaper describes a new concept for building serverless microa...

1   56   56  

serpapi-javascript

Scrape and parse search engine results using SerpApi.

6   56   56  

Junior_Zone

Vagas Jr. atualizadas diariamente. Telegram e Planilha Online

2   55   55  

learn.scrapinghub.com

Scrapinghub Learning Center. Report issues in Jira: Report issues in J...

24   55   55  

mtnt

Code for the collection and analysis of the MTNT dataset

4   55   55  

scraper-fourone-jobs

This is a anti-scraping cracker for extracting apply information of on...

12   55   55  

pge-outages-pre-2024

Tracking PG&E outages

7   55   55  

Euro2016_TerminalApp

:soccer: Instantly find :trophy:EURO 2016 live-streams & highlights, n...

10   54   54  

torrent-tracker-scraper

A UDP torrent tracker scraper library written in Python 3

15   54   54