Most popular scraping repositories and open source projects

parsera

Lightweight library for scraping web-sites with LLMs

63   1065   1065  

crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

117   1019   1019  

clean-text

๐Ÿงน Python package for text cleaning

79   975   975  

instagram-scraper

Scrape the Instagram frontend. Inspired from twitter-scraper by @kenne...

81   943   943  

mov-cli

Watch everything from your terminal.

49   940   940  

OSINT-Cheat-sheet

OSINT cheat sheet, list OSINT tools, wiki, dataset, article, book , re...

138   934   934  

hrequests

๐Ÿš€ Web scraping for humans

52   888   888  

linkedin

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker...

134   864   864  

loconotion

๐Ÿ“„ Python tool to turn Notion.so pages into lightweight, customizable...

137   852   852  

agentql

AgentQL is a suite of tools for connecting your AI to the web. Featuri...

110   843   843  

websurfx

:rocket: An open source alternative to searx which provides a modern-l...

102   838   838  

till

DataHen Till is a companion tool to your existing web scraper that ins...

22   814   814  

Lulu

[Unmaintained] A simple and clean video/music/image downloader ๐Ÿ‘พ

141   812   812  

Edu-Mail-Generator

Generate Free Edu Mail(s) within minutes

396   808   808  

easy-scraping-tutorial

Simple but useful Python web scraping tutorial code.

548   800   800  

kuwala

Kuwala is the no-code data platform for BI analysts and engineers enab...

53   793   793  

ImageScraper

:scissors: High performance, multi-threaded image scraper

103   772   772  

OF-Scraper

A completely revamped and redesigned fork, reimagined from scratch bas...

67   769   769  

gazpacho

๐Ÿฅซ The simple, fast, and modern web scraping library

56   764   764  

pdf.tocgen

A CLI toolset to generate table of contents for PDF files automaticall...

25   747   747  

secret-agent

The web scraper that's nearly impossible to block - now called @ulixee...

48   707   707  

newspaper4k

๐Ÿ“ฐ Newspaper4k a fork of the beloved Newspaper3k. Extraction of articl...

74   706   706  

lookyloo

Lookyloo is a web interface that allows users to capture a website pag...

86   705   705  

FastAnime

Your browser anime experience from the terminal

36   693   693  

rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation...

41   690   690  

auto-archiver

Automatically archive links to videos, images, and social media conten...

74   687   687  

Katana

python script for Google Dorking

155   683   683  

dataflowkit

Extract structured data from web sites. Web sites scraping.

80   681   681  

dark-knowledge

๐Ÿ˜ˆ๐Ÿ“š A curated library of research papers and presentations for counte...

39   677   677  

google-search-results-python

Google Search Results via SERP API pip Python Package

105   657   657  

Auto-Gmail-Creator

Open Source Bulk Auto Gmail Creator Bot with Selenium & Seleniumwire (...

374   643   643  

linkedin-profile-scraper-api

๐Ÿ•ต๏ธโ€โ™‚๏ธ LinkedIn profile scraper returning structured profile data in J...

162   621   621  

social-media-profiles-regexs

:card_index: Extract social media profiles and more with regular expre...

71   618   618  

comic-dl

Comic-dl is a command line tool to download manga and comics from vari...

68   592   592  

docker-selenium-lambda

The simplest demo of chrome automation by python and selenium in AWS L...

137   592   592  

rama

modular service framework to move and transform network packets

62   590   590  

socialreaper

Social media scraping / data collection library for Facebook, Twitter,...

93   584   584  

newcrawler

Free Web Scraping Tool with Java

115   582   582  

PHPScraper

A universal web-util for PHP.

76   558   558  

pricewise

Dive into web scraping and build a Next.js 13 eCommerce price tracker...

167   545   545  

facebook_data_analyzer

Analyze facebook copy of your data with ruby language. Download zip fi...

50   541   541  

spidermon

Scrapy Extension for monitoring spiders execution.

101   541   541  

jekyll

Jekyll-based static site for The Programming Historian

227   527   527  

untidetect-tools

List of anti-detect and humanizing tools and browsers, including captc...

54   526   526  

quick-start-guide

Python quick start guides to get the most out of Oxylabs' Web Scraper...

3   523   523  

nickjs

Web scraping library made by the Phantombuster team. Modern, simple &...

47   502   502  

scrapple

A framework for creating semi-automatic web content extractors

41   501   501  

gogoanime-api

Anime Streaming, Discovery API made with Cheerio and Express. Uses dat...

138   498   498  

browserforge

๐ŸŽญ Intelligent browser header & fingerprint generator

24   492   492  

scrape-linkedin-selenium

`scrape_linkedin` is a python package that allows you to scrape person...

164   483   483