Topic

scraping

Repositories (1766)

thepipe
thepipe emcf Python

Get clean data from tricky documents, powered by vision-language models ⚡

1.5k
OpenOutreach
OpenOutreach eracle Python

Linkedin Automation Tool: Describe your product. Define your target market. The AI finds the leads for you.

1.5k
api.consumet.org
api.consumet.org consumet TypeScript

A Modern Search Engine API for Anime, Movies/TVShows, Books, Light Novels, Manga, etc.

1.5k
Scweet
Scweet Altimis Python

Scrape tweets, profiles, followers and following from Twitter/X, no API key needed. Python library with smart multi-account pooling, proxy support and...

1.5k
DataEngineeringProject
DataEngineeringProject damklis Python

Example end to end data engineering project.

1.4k
mlscraper
mlscraper lorey Python

🤖 Scrape data from HTML websites automatically by just providing examples

1.4k
querido-diario
querido-diario okfn-brasil Python

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

1.3k
rebrowser-patches
rebrowser-patches rebrowser JavaScript

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy...

1.3k
agentql
agentql tinyfish-io Shell

AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements an...

1.3k
wreq-python
wreq-python 0x676e67 Rust

An ergonomic Python HTTP Client with TLS fingerprint

1.3k
parsel
parsel scrapy Python

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

1.3k
parsera
parsera raznem Python

Lightweight library for scraping web-sites with LLMs

1.3k
scrapy-cluster
scrapy-cluster istresearch Python

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

1.2k
Auto-Gmail-Creator
Auto-Gmail-Creator ai-to-ai Python

Open Source Bulk Auto Gmail Creator Bot with Selenium & Seleniumwire ( Python ). Feel free to contact me with Django/Flask, ML, AI, GPT, Automation, S...

1.2k
scrape-google-python
scrape-google-python oxylabs

In this tutorial, we showcase how to scrape public Google data with Python and Oxylabs API.

1.2k
Decodo
Decodo Decodo Java

HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.

1.2k
django-dynamic-scraper
django-dynamic-scraper holgerd77 Python

Creating Scrapy scrapers via the Django admin interface

1.2k
mov-cli
mov-cli mov-cli Python

Watch everything from your terminal.

1.1k
oj
oj online-judge-tools Python

Tools for various online judges. Downloading sample cases, generating additional test cases, testing your code, and submitting it.

1.1k
artoo
artoo medialab JavaScript

artoo.js - the client-side scraping companion.

1.1k
newspaper4k
newspaper4k AndyTheFactory Python

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

1.1k
websurfx
websurfx neon-mmd Rust

:rocket: An open source alternative to searx which provides a modern-looking :sparkles:, lightning-fast :zap:, privacy respecting :disguised_face:, se...

1.1k
crawly
crawly elixir-crawly Elixir

Crawly, a high-level web crawling & scraping framework for Elixir.

1.1k
auto-archiver
auto-archiver bellingcat Python

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

1.1k
browserforge
browserforge daijro Python

🎭 Intelligent browser header & fingerprint generator

1.1k
viu
viu viu-media Python

Your browser anime experience from the terminal

1.1k
OF-Scraper
OF-Scraper datawhores Python

A completely revamped and redesigned fork, reimagined from scratch based on the original onlyfans-scraper

1k
clean-text
clean-text jfilter Python

🧹 Python package for text cleaning

1k
hrequests
hrequests daijro Python

🚀 Web scraping for humans

1k
rama
rama plabayo Rust

modular service framework to move and transform network packets

996
scrapfly-scrapers
scrapfly-scrapers scrapfly Python

Scalable Python web scraping scripts for +40 popular domains

947
instagram-scraper
instagram-scraper meetmangukiya Python

Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.

946
Edu-Mail-Generator
Edu-Mail-Generator 0xjas0 Python

Generate Free Edu Mail(s) within minutes

865
loconotion
loconotion leoncvlt Python

📄 Python tool to turn Notion.so pages into lightweight, customizable static websites

851
pdf.tocgen
pdf.tocgen Krasjet Python

A CLI toolset to generate table of contents for PDF files automatically.

829
easy-scraping-tutorial
easy-scraping-tutorial MorvanZhou Jupyter Notebook

Simple but useful Python web scraping tutorial code.

817
till
till DataHenHQ Go

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code...

815
kuwala
kuwala kuwala-io JavaScript

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of...

809
Lulu
Lulu iawia002 Python

[Unmaintained] A simple and clean video/music/image downloader 👾

806
ImageScraper
ImageScraper sananth12 Python

:scissors: High performance, multi-threaded image scraper

777
gazpacho
gazpacho maxhumber Python

🥫 The simple, fast, and modern web scraping library

769
lookyloo
lookyloo Lookyloo Python

Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.

767
linkedin-profile-scraper-api
linkedin-profile-scraper-api josephlimtech TypeScript

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON.

753
dark-knowledge
dark-knowledge prescience-data JavaScript

😈📚 A curated library of research papers and presentations for counter-detection and web privacy enthusiasts.

742
google-search-results-python
google-search-results-python serpapi Python

Google Search Results via SERP API pip Python Package

734
secret-agent
secret-agent ulixee TypeScript

The web scraper that's nearly impossible to block - now called @ulixee/hero

732
dataflowkit
dataflowkit slotix Go

Extract structured data from web sites. Web sites scraping.

712
Katana
Katana TebbaaX Python

python script for Google Dorking

683
HomeHarvest
HomeHarvest ZacharyHampton Python

Python package for scraping real estate property data

671
comic-dl
comic-dl Xonshiz Python

Comic-dl is a command line tool to download manga and comics from various comic and manga sites. Supported sites : readcomiconline.to, mangafox.me, co...

649