Most popular scraping repositories and open source projects

shopify-spy

Extract structured data from Shopify websites.

47   88   88  

ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the...

25   88   88  

feedbridge

Plugin based RSS feed generator for sites that don't offer any. Serves...

6   88   88  

billy

legacy backend for Open States

48   87   87  

html-table-extractor

extract data from html table

22   86   86  

spiderbuf

Spiderbuf 是一个专注于 Python 爬虫练习的网站。提供丰富的爬虫教程、爬虫...

8   86   86  

newser

Newser is a simple utility to generate a pdf with you favorite news ar...

3   86   86  

top-github-scraper

Scape top GitHub repositories and users based on keywords

25   85   85  

introWebScraping

Code exemple for my blog posts

47   83   83  

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler dev...

14   83   83  

open-australian-legal-corpus-creator

The code used to create and update the Open Australian Legal Corpus, t...

13   82   82  

linkedin-bot

Automate your LinkedIn Outreach with Selenium and GeckoDriver.

20   81   81  

Whatsapp-Net

Generate a network graph of connections from your WhatsApp groups data

7   81   81  

amazon_scraper

Amazon products scraper with using of rotating proxies and headless Ch...

19   81   81  

google-covid19-mobility-reports

Data extraction of Google's COVID-19 Mobility Reports

11   80   80  

pypatent

Search for and retrieve US Patent and Trademark Office Patent Data

20   79   79  

requests-random-user-agent

Configures the requests library to randomly select a desktop User-Agen...

22   79   79  

UltimateTab

Enhanced, ads-free and fast responsive interface to browse guitar tabs...

15   79   79  

pydork

Scraping and listing text and image searches on Google, Bing, DuckDuck...

3   79   79  

Solana_Twitter_Token_NFT_Sniper_Bot

🔥Solana Token/NFT snipping Bot w/ twiter - Raydium, Pumpfun Snipping...

75   78   78  

linkedin-scraper

Tool to scrape linkedin

13   78   78  

WebScraper

Python-based web crawling script with randomized intervals, user-agent...

19   78   78  

rota

A high-performance proxy rotation engine with automated IP management...

7   77   77  

instagram-users-scraper

Instagram Scraper. Scrape Instagram followers, following list, and pos...

12   77   77  

map-email-scraper

A open source tool for collating publically available contact informat...

16   76   76  

Miyou

An anime discovery, streaming site made with React.js. It uses AniList...

34   76   76  

agentql-mcp

Model Context Protocol server that integrates AgentQL's data extractio...

17   76   76  

linkpreview

Open Graph, Twitter Card, Oembed preview. Shows visual cards that mimi...

10   75   75  

outscraper-python

The library provides convenient access to the Outscraper API from appl...

21   75   75  

gsocanalyzer

A blazingly fast tool to analyze all the selected organizations in Goo...

40   75   75  

webdext

Intelligent Web Data Extractor

16   74   74  

venom

Your preferred open source focused crawler for the deep web.

5   74   74  

feedsearch-crawler

Crawl sites for RSS, Atom, and JSON feeds.

9   74   74  

chegg-scraper

Download Chegg homework-help questions to self-sufficient HTML files

24   74   74  

copycat

A PHP Scraping Class

13   73   73  

Captcha-Tools

All-in-one Python (And now Go!) module to help solve captchas with Cap...

7   73   73  

docudigger

Website scraper for getting invoices automagically as pdf (useful for...

7   72   72  

abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (lik...

4   71   71  

linkedin-scrapper

LinkedIn scrapper is advanced search result scrapper script build with...

25   71   71  

api-client

API client to develop tools for competitive programming

23   71   71  

Instagram-downloader

Instagram user's photos and videos downloader. Download all media file...

18   71   71  

mangalivre-api

API não-oficial do mangá livre feita com Node.js e Express.js.

9   70   70  

SourceScraper

Simple library which helps you to retrieve the source of various video...

19   69   69  

goClone

🌱 goClone - clone websites in seconds

4   69   69  

rebrowser-bot-detector

Modern tests to detect automated browser behavior. Cover most importan...

5   68   68  

TikScraperPHP

Wrapper for TikTok API

21   68   68  

instagram-without-api-node

A simple Node.js code to get unlimited instagram public pictures by ev...

10   68   68  

proxy-scraper

Scraping from x75 websites asynchronously

12   67   67  

public-roadmap

Public Roadmap for SerpApi, LLC (https://serpapi.com)

13   67   67  

moneyman

Automatically save transactions from all major Israeli banks and credi...

47   67   67