Most popular scraping repositories and open source projects

gopher-parse-sitemap oxffaa Go

A high effective golang library for parsing big-sized sitemaps and avoiding high memory usage. The sitemap parser was written on golang without extern...

39 19 39

infotennis glad94 Jupyter Notebook

Python for scraping and processing tennis match data from the ATP Tour website.

39 5 39

CobWeb-lnx GoncaloMark Python

CobWeb is a Python library for web scraping. The library consists of two classes: Spider and Scraper.

39 2 39

papercut armand1m TypeScript

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Cachin...

39 2 39

pyplexity citiususc Python

Cleaning tool for web scraped text

39 3 39

linkedin-scraper akramaznakour JavaScript

Enhanced LinkedIn Job Search Chrome Extension

39 7 39

tvseries athityakumar HTML

TV Series is a tool that scrapes Episode Synopsis' of popular TV Series' from websites like Wikipedia / IMDb and show in one place with a user-friendl...

38 25 38

fulldom-server strugee JavaScript

Proxy-like server that will show you the DOM of a page after JS runs

38 6 38

extract-social-media fluquid Python

Extract social media links and account names from websites.

38 16 38

crawlee-one JuroOravec TypeScript

Production-ready web scraping in a single function call. Built on Crawlee.

38 6 38

Whatsapp-Scraper In-vincible Python

Scraps all the open chats, and their last n messages, and saves them in a csv file

38 10 38

etf4u leoncvlt Python

📊 Python tool to scrape real-time information about ETFs from the web and mixing them together by proportionally distributing their assets allocation

38 5 38

EzSolver ismoiloffS Python

Cloudflare Turnstile solver & bypass — Python, real Chrome browser, no paid APIs. Local HTTP API service included. Auto-solves invisible and managed (...

38 7 38

rebrowser-puppeteer rebrowser

A drop-in replacement for puppeteer patched with rebrowser-patches. It allows to pass modern automation detection tests.

38 7 38

super-scraper apify TypeScript

Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!

38 17 38

lInkedIn-reverese-lookup harsha-iiiv JavaScript

🔎Search LinkedIn profile by email address📧

38 7 38

taktik-bot masterFuf Python

Instagram & TikTok automation via real Android devices. Likes, follows, DMs, scraping. No API abuse. Built with Python, uiautomator2 & ADB.

38 5 38

scrapeops-scrapy-sdk ScrapeOps Python

Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the box.

38 13 38

freesoccer andrelmlins TypeScript

:soccer: Free API with results from national soccer competitions

37 10 37

google-scraper samaybhavsar PHP

This class can retrieve search results from Google.

37 22 37

mangareader-api stabldev Python

A Python based web scraping api built with fastapi to get manga contents.

37 11 37

policy-data-analyzer wri-dssg-omdena Jupyter Notebook

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the...

37 8 37

sneakpeek flulemon Python

Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex sc...

37 0 37

chirps schedutron Python

Twitter bot powering @arichduvet

36 9 36

webradio-metadata adblockradio JavaScript

Collection of scraping recipes to get metadata about what is being streamed on webradios

36 14 36

InstaBot drbuche Python

Simple and friendly Bot for Instagram, using Selenium and Scrapy with Python.

36 11 36

poketo poketo JavaScript

Node library for scraping manga sites

36 4 36

markupever awolverp Rust

The fast, most optimal, and correct HTML & XML parsing library for Python written in Rust.

36 1 36

tripadvisor-scraper andorfermichael Python

Scrape the hotel reviews of a whole city on TripAdvisor

36 14 36

PastebinScrapy apurvsinghgautam Python

Threat hunting tool for scraping latest scrapes from Pastebin

36 13 36

Python-scraper-tutorial Decodo Python

A short introduction to scraping with Python with given steps and an example scraper script.

36 8 36

zoominfo_scraper ScrapingAnt Python

Zoominfo scraper with using of rotating proxies and headless Chrome from ScrapingAnt

36 11 36

facebook-discussion-tk internaut Python

A collection of tools to (semi-)automatically collect and analyze data from online discussions on Facebook groups and pages.

35 5 35

api-flight.com fgparamio HTML

Main API Flight Git Repository

35 9 35

jmd_imagescraper joedockrill Jupyter Notebook

Image scraping library for creating deep learning datasets

35 16 35

SneakerBot mridulghanshala Python

Buy limited edition sneakers

35 9 35

chromedl rusq Go

Go library for scraping or downloading files bypassing Cloudflare protection and browser checks

35 2 35

contact-use browser-use HTML

✉️ Use the power of browser-use to contact any person or organization... by any means necessary

35 4 35

geetest-captcha-solver ScraperBox-Github JavaScript

Solve the Geetest slider captcha with Puppeteer

35 10 35

node-red-contrib-nbrowser Steveorevo HTML

Provides a virtual web browser (a.k.a. "headless browser") appearing as a node.

34 16 34

n8n-ai-instagram-scraper Peter-SB Python

Self hosted AI workflow for scraping Instagram Reels (audio and description). Extracting, summarising and categorising, then storing all relevant info...

34 13 34