Most popular scraping repositories and open source projects

python joaopauloaramuni HTML

Repo Python

57 2 57

PythonScrapyBasicSetup matejbasic Python

Basic setup with random user agents and IP addresses for Python Scrapy Framework.

56 14 56

mtnt pmichel31415 Python

Code for the collection and analysis of the MTNT dataset

56 4 56

scraper-fourone-jobs kokokuo Python

This is a anti-scraping cracker for extracting apply information of one of Taiwan jobs recruiting website.

56 12 2

sample-web-scraping-with-electron Tazeg JavaScript

Sample project for web scraping with Electron

56 15 56

actor-facebook-scraper pocesar TypeScript

Scrape public Facebook pages, posts, reviews and comments

56 32 56

silkworm BitingSnakes Python

Async web scraping framework on top of Rust. Works with Free-threaded Python (`PYTHON_GIL=0`).

56 2 56

Euro2016_TerminalApp jctissier HTML

:soccer: Instantly find :trophy:EURO 2016 live-streams & highlights, now a Web App!

55 8 55

ogpParser ukyoda TypeScript

Open Graph Protocol Parser for Node.js

55 12 55

hext html-extract C++

Domain-specific language for extracting structured data from HTML documents

55 3 55

socials lorey Python

👨‍👩‍👦 Python library and CLI to turn URLs into structured social media profiles.

55 9 1

Junior_Zone Moscarde Python

Vagas Jr. atualizadas diariamente. Telegram e Planilha Online

55 2 55

aniyoi-api miukyo TypeScript

REST API Anime Subtitle Indonesia | Streaming Anime Sub Indo

55 17 55

CraigslistScraper ryanirl Python

Simple webscraper for Craigslist.

55 22 1

UltraStar-CLI martiinii TypeScript

Download any song from biggest database of UltraStar songs for your karaoke party!

54 9 54

onlyfans-scraper kr4ude Python

A tool that allows you to scrape media from any Onlyfans account and more

54 12 54

firecrawl-quickstarts alexfazio Jupyter Notebook

A collection of cookbooks to help developers get started quickly with the Firecrawl API.

54 4 54

diffbot-php-client Swader PHP

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

53 20 7

learn.scrapinghub.com scrapinghub CSS

Scrapinghub Learning Center. Report issues in Jira: Report issues in Jira: https://scrapinghub.atlassian.net/projects/WEB

53 23 101

dart-scraper josw123 Vue

한국 금융감독원에서 운영하는 다트(Dart) 시스템을 이용한 기업 재무제표 추출 프로그램

53 20 53

AI-Cursor-Scraping-Assistant TheWebScrapingClub Python

A powerful tool that leverages Cursor AI and MCP (Model Context Protocol) to easily generate web scrapers for various types of websites.

53 16 53

trex tracking-exposed HTML

youtube & tiktok analysis + youchoose recommendation custmizer. backend, extensions, and tooling

53 14 1

getter kastaid Python

A powerful and customizable Telegram userbot built with Telethon

53 29 2

python-scrapfly scrapfly Python

Scrapfly Python SDK for headless browsers and proxy rotation

53 15 53

scrapers montoyamoraga Python

scrapers for building your own image databases

52 7 52

torrent-tracker-scraper project-mk-ultra Python

A UDP torrent tracker scraper library written in Python 3

52 15 3

garlic velocitatem JavaScript

🧄🧛 protect your website from being scraped by bots.

52 0 52

react-node-web-scraper codegratia JavaScript

Final Year project, scraping data of e-commerce stores and display in ReactJS app.

52 24 52

Filmweb2Letterboxd JSerwatka JavaScript

Eksport ocen z Filmweb'u do pliku csv w formacie akceptowanym przez importer Letterboxd

52 3 52

ScarperApi Anshu78780 TypeScript

Its a Scarper api that will give you direct movie data in ur local machine without needing to watch and ads . It now supports netmirror.

52 33 52

hyper-sdk-js Hyper-Solutions TypeScript

JavaScript / TypeScript SDK for Bot Protection Bypass - Automate Akamai, Incapsula, Kasada, and DataDome. No browsers required. Solve challenges and g...

51 5 51

beautifulsoup-tutorial hackersandslackers Python

:sparkles: :ramen: Scrape webpage metadata using BeautifulSoup.

51 17 51

CaseHarvester dismantl Python

AWS-based application for scraping the Maryland Judiciary Case Search

51 12 51

rebrowser-playwright rebrowser

A drop-in replacement for playwright patched with rebrowser-patches. It allows to pass modern automation detection tests.

51 2 51

tiktok-trending-data-api ogohogo JavaScript

Scraping the TikTok Discovery Data API every 1 hour using Github Actions to view changes

51 7 51

Scraping-Dynamic-JavaScript-Ajax-Websites-With-BeautifulSoup oxylabs Python

A guide on how to scrape JavaScript rendered websites with Python and BeautifulSoup.

51 8 51

configs Diggernaut

Public, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores

50 16 50

dilbert-viewer rharish101 Rust

A simple comic viewer for Dilbert by Scott Adams

50 5 2

News_Summary sunnysai12345 Jupyter Notebook

Dataset and scripts for scraping the news articles from popular sources along with the summary of the article.

50 28 50

ai_papers_scrapper george-gca Python

Download papers pdfs and other info from main AI conferences

50 9 50

hyper-sdk-go Hyper-Solutions Go

Go SDK for Bot Protection Bypass - Automate Akamai, Incapsula, Kasada, and DataDome. No browsers required. Solve challenges and generate valid sensors...

50 4 50

AI_Manga_Reader AI-Manga-Readers JavaScript

AI Manga Reader is a next-gen manga app powered by the MangaDex API, offering vast multi-language content and flexible reading modes. It uses AI-power...

48 29 48

local-api-client-python kameleo-io Python

Official Python library for interacting with Kameleo Client

48 6 48

instagram-without-api orsifrancesco PHP

A simple PHP code to get unlimited instagram public pictures by every user without api, without credentials.

48 13 48

freenom-auto-renew-domains Sorok-Dva TypeScript

A scraper built with puppeteer that auto renew free domains on Freenom and send discord message using bot

48 17 48

youtube-comment-scraper ahmedshahriar Jupyter Notebook

This script will dump youtube video comments to a CSV from youtube video links. Video links can be placed inside a variable or list or CSV

48 16 48

flutter_notification_listener jiusanzhou Kotlin

Listen for and interact with Android notifications from Flutter.

48 64 2

AngleParse kamome283 C#

HTML parsing and processing tool for PowerShell.

47 6 47

DeepSearchJobs wakil69 Python

DeepSearchJobs is a job-discovery engine that uncovers hidden, niche, and low-competition opportunities not found on major platforms. It uses smart sc...

47 2 47

consentcrawl dumkydewilde Python

Automatically check for GDPR/CCPA and cookie consent by running a Playwright headless browser to check for marketing and analytics scripts firing befo...

47 7 47

scraping

Repositories (1766)