Topic

scraping

Repositories (1626)

oj
oj online-judge-tools Python

Tools for various online judges. Downloading sample cases, generating additional test cases, testing your code, and submitting it.

1.1k
parsera
parsera raznem Python

Lightweight library for scraping web-sites with LLMs

1.1k
crawly
crawly elixir-crawly Elixir

Crawly, a high-level web crawling & scraping framework for Elixir.

1k
clean-text
clean-text jfilter Python

🧹 Python package for text cleaning

982
instagram-scraper
instagram-scraper meetmangukiya Python

Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.

943
mov-cli
mov-cli mov-cli Python

Watch everything from your terminal.

940
OSINT-Cheat-sheet
OSINT-Cheat-sheet Jieyab89 HTML

OSINT cheat sheet, list OSINT tools, wiki, dataset, article, book , red team OSINT for hackers and OSINT tips and OSINT branch. This repository will g...

934
rebrowser-patches
rebrowser-patches rebrowser JavaScript

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy...

901
hrequests
hrequests daijro Python

🚀 Web scraping for humans

888
viu
viu viu-media Python

Your browser anime experience from the terminal

875
linkedin
linkedin eracle Python

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy

864
loconotion
loconotion leoncvlt Python

📄 Python tool to turn Notion.so pages into lightweight, customizable static websites

854
agentql
agentql tinyfish-io Python

AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements an...

843
websurfx
websurfx neon-mmd Rust

:rocket: An open source alternative to searx which provides a modern-looking :sparkles:, lightning-fast :zap:, privacy respecting :disguised_face:, se...

838
till
till DataHenHQ Go

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code...

814
Lulu
Lulu iawia002 Python

[Unmaintained] A simple and clean video/music/image downloader 👾

810
Edu-Mail-Generator
Edu-Mail-Generator AmmeySaini Python

Generate Free Edu Mail(s) within minutes

808
easy-scraping-tutorial
easy-scraping-tutorial MorvanZhou Jupyter Notebook

Simple but useful Python web scraping tutorial code.

802
kuwala
kuwala kuwala-io JavaScript

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of...

793
rama
rama plabayo Rust

modular service framework to move and transform network packets

774
ImageScraper
ImageScraper sananth12 Python

:scissors: High performance, multi-threaded image scraper

772
OF-Scraper
OF-Scraper datawhores Python

A completely revamped and redesigned fork, reimagined from scratch based on the original onlyfans-scraper

769
gazpacho
gazpacho maxhumber Python

🥫 The simple, fast, and modern web scraping library

764
pdf.tocgen
pdf.tocgen Krasjet Python

A CLI toolset to generate table of contents for PDF files automatically.

747
browserforge
browserforge daijro Python

🎭 Intelligent browser header & fingerprint generator

741
secret-agent
secret-agent ulixee TypeScript

The web scraper that's nearly impossible to block - now called @ulixee/hero

717
newspaper4k
newspaper4k AndyTheFactory HTML

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

706
lookyloo
lookyloo Lookyloo Python

Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.

705
dataflowkit
dataflowkit slotix Go

Extract structured data from web sites. Web sites scraping.

688
auto-archiver
auto-archiver bellingcat Python

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

687
Katana
Katana TebbaaX Python

python script for Google Dorking

683
linkedin-profile-scraper-api
linkedin-profile-scraper-api josephlimtech TypeScript

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON.

680
dark-knowledge
dark-knowledge prescience-data JavaScript

😈📚 A curated library of research papers and presentations for counter-detection and web privacy enthusiasts.

677
google-search-results-python
google-search-results-python serpapi Python

Google Search Results via SERP API pip Python Package

657
Auto-Gmail-Creator
Auto-Gmail-Creator ai-to-ai Python

Open Source Bulk Auto Gmail Creator Bot with Selenium & Seleniumwire ( Python ). Feel free to contact me with Django/Flask, ML, AI, GPT, Automation, S...

643
social-media-profiles-regexs
social-media-profiles-regexs lorey Python

:card_index: Extract social media profiles and more with regular expressions

636
socialreaper
socialreaper ScriptSmith Python

Social media scraping / data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

614
docker-selenium-lambda
docker-selenium-lambda umihico Dockerfile

The simplest demo of chrome automation by python and selenium in AWS Lambda

613
comic-dl
comic-dl Xonshiz Python

Comic-dl is a command line tool to download manga and comics from various comic and manga sites. Supported sites : readcomiconline.to, mangafox.me, co...

592
newcrawler
newcrawler speed JavaScript

Free Web Scraping Tool with Java

582
scrapfly-scrapers
scrapfly-scrapers scrapfly Python

Scalable Python web scraping scripts for +40 popular domains

569
PHPScraper
PHPScraper spekulatius PHP

A universal web-util for PHP.

558
spidermon
spidermon scrapinghub Python

Scrapy Extension for monitoring spiders execution.

546
pricewise
pricewise adrianhajdin TypeScript

Dive into web scraping and build a Next.js 13 eCommerce price tracker within a single video that teaches you data scraping, cron jobs, sending emails,...

545
facebook_data_analyzer
facebook_data_analyzer Lackoftactics Ruby

Analyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, con...

541
jekyll
jekyll programminghistorian HTML

Jekyll-based static site for The Programming Historian

532
untidetect-tools
untidetect-tools TheGP

List of anti-detect and humanizing tools and browsers, including captcha solvers and sms-activation.

526
quick-start-guide
quick-start-guide oxylabs

Python quick start guides to get the most out of Oxylabs' Web Scraper API free trial.

523
scrapple
scrapple AlexMathew Python

A framework for creating semi-automatic web content extractors

502
nickjs
nickjs phantombuster JavaScript

Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)

500