Most popular scraping repositories and open source projects

GoodreadsScraper

Scrape data from Goodreads using Scrapy and Selenium :books:

36   136   136  

double-agent

A test suite of common scraper detection techniques. See how detectabl...

10   136   136  

arxiv-miner

arxiv_miner is a toolkit for mining research papers on CS ArXiv.

8   134   134  

nimquery

Nim library for querying HTML using CSS-selectors (like JavaScripts do...

8   134   134  

vlrggapi

An Unofficial REST API for vlr.gg, a site for Valorant Pro Esports mat...

36   133   133  

scraperai

ScraperAI is an open-source, AI-powered tool designed to simplify web...

13   133   133  

ninjemail

Python library for automated email account creation. Create multiple a...

38   133   133  

tiktok-trending-data

Scraping the TikTok discovery web API every 15 minutes using Github Ac...

22   132   132  

lambda-scraper

Use AWS Lambda functions as a proxy pool to scrape web pages.

16   130   130  

Movies-and-Series-Scraper

A console application to scrape a valid watching links for any movie o...

22   130   130  

apify-sdk-python

The Apify SDK for Python is the official library for creating Apify Ac...

11   129   129  

nintendeals

Library with a set of tools for scraping information about Nintendo ga...

18   129   129  

seleniumcrawler

An example using Selenium webdrivers for python and Scrapy framework t...

45   127   127  

go-crawler

A web crawling framework implemented in Golang, it is simple to write...

18   127   127  

robox

Simple library for exploring/scraping the web or testing a website you...

2   127   127  

Instagram-to-discord

Monitor instagram user account and automatically post new images to di...

59   126   126  

proxifier

A fast, modern and intelligent proxy rotator perfect for crawling and...

16   126   126  

pastepwn

Python framework to scrape Pastebin pastes and analyze them

67   125   125  

html2rss

📰 Build RSS 2.0 feeds from websites (and JSON APIs) automatically or...

10   125   125  

htmlSQL

htmlSQL is a experimental PHP library which allows you to access HTML...

39   124   124  

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC...

16   119   119  

WebReaper

Web scraper, crawler and parser in C#. Designed as simple, declarative...

28   119   119  

automated-web-scraper-autoscraper

This tutorial shows how to automate your web scraping processes using...

0   116   116  

MachineLearning

Machine learning for beginner(Data Science enthusiast)

131   115   115  

bots-zoo

26   113   113  

scout-lang

A web crawling programming language

6   113   113  

rs-bed-covid-indo-api

API ketersediaan rumah sakit dan tempat tidur rumah sakit untuk pasien...

25   112   112  

scraper

Nodejs web scraper. Contains a command line, docker container, terrafo...

17   111   111  

media-search-engine

Search geolocations for (social) media posts in databases like Belling...

11   111   111  

scrapy-puppeteer

Scrapy + Puppeteer

29   111   111  

TelegramAdderTool

An Telegram Mass Members Adding/Scraping Tool Written In Python Using...

52   110   110  

CC_Scrapper

Telegram CC Scrapper - Debit/Credit Card [channel public or private /...

37   109   109  

zyte-smartproxy-headless-proxy

A complimentary proxy to help to use SPM with headless browsers

37   108   108  

viewstate

ASP.NET View State Decoder

15   105   105  

ScrapeMate

Scraping assistant tool. Editing and maintaining CSS/XPath selectors a...

14   102   102  

humanparser

Parse a human name string into salutation, first name, middle name, la...

33   102   102  

job_search

An app to search startup jobs scraped from websites written in Elixir,...

16   101   101  

torrengo

Torrengo is a CLI (command line) program written in Go which concurren...

15   100   100  

python-adv-web-apps

Updated python-beginners docs and examples

94   97   97  

clone-anonymous-github

Easily download anonymous Github repositories from https://anonymous.4...

8   96   96  

browser-pool

A Node.js library to easily manage and rotate a pool of web browsers,...

15   94   94  

puppeteer-botcheck

🕵‍♂ Bot detection tests for Puppeteer. Hide and seek!

8   93   93  

core

:spider: The PHP SERP Spider - A search engine scraper

44   91   91  

mechaml

OCaml functional web scraping library

6   91   91  

devdocs-to-llm

Turn any developer documentation into a GPT

12   91   91  

Detect-CMS

PHP Library for detecting CMS

51   90   90  

linkedin-easyapply-using-AI

Automate your LinkedIn job applications with AI! This bot utilizes GPT...

15   89   89  

instagram-media-scraper

A simple Node.js code to get public information and media from every I...

10   89   89  

Deals-Scraper

Deals Scraper is a Canadian tool to find good deals on websites like F...

17   89   89  

KC-Scraper

A powerful open-source proxy scraper

21   89   89