Most popular scraping repositories and open source projects

rayobrowse rayobyte-data Python

Stealth Chromium browser for web scraping and AI agents.

189 13 6

SpotiFile Michael-K-Stein Python

Spotify scraper

188 19 0

Python-Selenium-Action MarketingPipeline Python

Run Selenium with Python via Github Actions using Headless or Non-Headless browsers!

188 43 1

Threat-Actor-Usernames-Scrape spmedia

A collection of intel and usernames scraped from various cybercrime sources & forums. DarkForums, HackForums, Patched, Cracked, BreachForums, OGUser,...

187 24 187

UdemyCourseGrabber keethesh Python

Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [i...

186 26 186

DotnetCrawler mehmetozkaya C#

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library des...

182 62 12

Web-Data-Scraper umbrellaDocumentation JavaScript

Web Data Scraper - no-code internet scraping. Extract and export to CSV, Excel, JSON, Google Sheets, and Webhook.

180 167 180

local-deepsearch-academic iblameandrew Python

An implementation of Google Deep Search 🕵️ with support for 1000+ references, local inference, chatting with your scraping session using RAPTOR, and...

179 28 179

JsonGenius semanser Go

Get structured JSON data from any page.

176 10 176

instagram-media-scraper ahmedrangel JavaScript

A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2025

176 24 176

shadow-useragent lobstrio Python

Pick the most common user-agents on the Internet 👻

173 11 173

fantasy-basketball KengoA Jupyter Notebook

Scraping statistics, predicting NBA player performance with neural networks and boosting algorithms, and optimising lineups for Draft Kings with gene...

172 49 172

languagepod101-scraper nedlir Python

Python scraper for Language Pods such as Japanesepod101.com :japanese_ogre: :japan: :sushi: Compatible with Japanese, Chinese, French, German, Italian...

171 27 14

scrapers Police-Data-Accessibility-Project Python

Code relating to scraping public police data.

170 38 170

search-engine-google serp-spider PHP

:spider: Google client for SERPS

168 59 168

tiktok-trending-data antiops

Scraping the TikTok discovery web API every 15 minutes using Github Actions to view changes

167 26 167

apify-sdk-python apify Python

The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, loc...

166 23 166

FredsRoadtripStoryteller realityexpander Kotlin

Hear local historical markers as you travel on your road-trip. 100% Shared Compose UI, Kotlin native cross-platform codebase. Includes Cocoapods, Goog...

166 16 166

Leetcode-Questions-Scraper Bishalsarang Python

Scrape Algorithm Questions from leetcode and generate html and epub file

163 47 163

agentql-mcp tinyfish-io Shell

Model Context Protocol server that integrates AgentQL's data extraction capabilities.

161 36 161

xquery antchfx Go

Extract data or evaluate value from HTML/XML documents using XPath

156 27 156

tweetdrop Anish-Agnihotri TypeScript

Generate dispersable airdrops from Twitter threads.

156 24 5

jimutmap Jimut123 Python

API to get enormous amount of high resolution satellite images from satellites.pro quickly through multi-threading! create map your own map dataset. B...

152 20 1

go-crawler lizongying Go

A web crawling framework implemented in Golang, it is simple to write and delivers powerful performance. It comes with a wide range of practical middl...

150 21 150

jazz jazzdotdev Rust

The Scripting Engine that Combines Speed, Safety, and Simplicity

148 10 148

decipher-research-agent mtwn105 TypeScript

Turn topics, links, and files into AI-generated research notebooks — summarize, explore, and ask anything.

148 34 148

rebrowser-bot-detector rebrowser JavaScript

Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.

148 10 148

proxifier rookmoot Go

A fast, modern and intelligent proxy rotator perfect for crawling and scraping public data.

147 17 3

GoodreadsScraper havanagrawal Python

Scrape data from Goodreads using Scrapy and Selenium :books:

147 40 3

sasori karthikuj JavaScript

Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

145 17 4

cinemaempoa cumbucadev Python

Site que agrega filmes em cartaz em algumas das diversas salas de cinema de Porto Alegre.

144 29 5

sqrape cathalgarvey Go

Simple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)

143 7 143

Movies-and-Series-Scraper yousefkotp Python

A console application to scrape a valid watching links for any movie or series with exact season and episode number, you can also download a whole sea...

143 38 143

od-database simon987 Python

Distributed crawler, database and web frontend for public directories indexing

142 23 13

arxiv-miner valayDave Python

arxiv_miner is a toolkit for mining research papers on CS ArXiv.

141 8 141

html2rss html2rss Ruby

📰 Build RSS 2.0 feeds from websites (and JSON APIs) automatically or with a few CSS selectors.

141 11 141

WebReaper alex-on-ai C#

AI-native web scraper. Single binary with a bundled Claude Code skill. MIT-licensed alternative to Firecrawl.

141 33 3

double-agent unblocked-web TypeScript

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

139 10 2

nimquery GULPF Nim

Nim library for querying HTML using CSS-selectors (like JavaScripts document.querySelector)

138 9 138

lambda-scraper teticio JavaScript

Use AWS Lambda functions as a proxy pool to scrape web pages.

138 16 138

wget-lua ArchiveTeam C

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

137 18 19

Upwork-AI-jobs-applier kaymen99 Python

AI tool for automating Upwork job applications using AI agents to find and qualify jobs, write personalized cover letters, and prepare for interviews...

136 39 136

curl-post-requests oxylabs

Learn how to send POST requests with cURL.

136 0 136

instagram-users-scraper floriandiud TypeScript

Instagram Scraper. Scrape Instagram followers, following list, and post authors. Download CSV files with Instagram users from followers, following, ta...

136 24 136

spiderbuf hhuayuan Python

Spiderbuf 是一个专注于 Python 爬虫练习的网站。提供丰富的爬虫教程、爬虫案例解析和爬虫练习题。Python爬虫开发强化练习，在矛与盾的攻防中不断提高技术水平，...

135 13 1

ctenopharyngodon-idella touero Java

Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.

134 1 134

wreq-js sqdshguy TypeScript

HTTP client for Node.js with browser TLS fingerprint impersonation

132 12 2

scrapy-scrapingbee ScrapingBee Python

JavaScript support and proxy rotation for Scrapy with ScrapingBee.

131 6 131

ScrapeMate hermit-crab JavaScript

Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.

130 16 8

abx-dl ArchiveBox Python

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless...

129 7 3

scraping

Repositories (1766)