Topic

scraping

Repositories (1766)

gopher-parse-sitemap
gopher-parse-sitemap oxffaa Go

A high effective golang library for parsing big-sized sitemaps and avoiding high memory usage. The sitemap parser was written on golang without extern...

39
infotennis
infotennis glad94 Jupyter Notebook

Python for scraping and processing tennis match data from the ATP Tour website.

39
CobWeb-lnx
CobWeb-lnx GoncaloMark Python

CobWeb is a Python library for web scraping. The library consists of two classes: Spider and Scraper.

39
papercut
papercut armand1m TypeScript

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Cachin...

39
pyplexity
pyplexity citiususc Python

Cleaning tool for web scraped text

39
linkedin-scraper
linkedin-scraper akramaznakour JavaScript

Enhanced LinkedIn Job Search Chrome Extension

39
tvseries
tvseries athityakumar HTML

TV Series is a tool that scrapes Episode Synopsis' of popular TV Series' from websites like Wikipedia / IMDb and show in one place with a user-friendl...

38
fulldom-server
fulldom-server strugee JavaScript

Proxy-like server that will show you the DOM of a page after JS runs

38
extract-social-media
extract-social-media fluquid Python

Extract social media links and account names from websites.

38
crawlee-one
crawlee-one JuroOravec TypeScript

Production-ready web scraping in a single function call. Built on Crawlee.

38
Whatsapp-Scraper
Whatsapp-Scraper In-vincible Python

Scraps all the open chats, and their last n messages, and saves them in a csv file

38
etf4u
etf4u leoncvlt Python

📊 Python tool to scrape real-time information about ETFs from the web and mixing them together by proportionally distributing their assets allocation

38
EzSolver
EzSolver ismoiloffS Python

Cloudflare Turnstile solver & bypass — Python, real Chrome browser, no paid APIs. Local HTTP API service included. Auto-solves invisible and managed (...

38
rebrowser-puppeteer
rebrowser-puppeteer rebrowser

A drop-in replacement for puppeteer patched with rebrowser-patches. It allows to pass modern automation detection tests.

38
super-scraper
super-scraper apify TypeScript

Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!

38
lInkedIn-reverese-lookup
lInkedIn-reverese-lookup harsha-iiiv JavaScript

🔎Search LinkedIn profile by email address📧

38
taktik-bot
taktik-bot masterFuf Python

Instagram & TikTok automation via real Android devices. Likes, follows, DMs, scraping. No API abuse. Built with Python, uiautomator2 & ADB.

38
scrapeops-scrapy-sdk
scrapeops-scrapy-sdk ScrapeOps Python

Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the box.

38
freesoccer
freesoccer andrelmlins TypeScript

:soccer: Free API with results from national soccer competitions

37
google-scraper
google-scraper samaybhavsar PHP

This class can retrieve search results from Google.

37
mangareader-api
mangareader-api stabldev Python

A Python based web scraping api built with fastapi to get manga contents.

37
policy-data-analyzer
policy-data-analyzer wri-dssg-omdena Jupyter Notebook

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the...

37
sneakpeek
sneakpeek flulemon Python

Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex sc...

37
chirps
chirps schedutron Python

Twitter bot powering @arichduvet

36
webradio-metadata
webradio-metadata adblockradio JavaScript

Collection of scraping recipes to get metadata about what is being streamed on webradios

36
InstaBot
InstaBot drbuche Python

Simple and friendly Bot for Instagram, using Selenium and Scrapy with Python.

36
poketo
poketo poketo JavaScript

Node library for scraping manga sites

36
markupever
markupever awolverp Rust

The fast, most optimal, and correct HTML & XML parsing library for Python written in Rust.

36
tripadvisor-scraper
tripadvisor-scraper andorfermichael Python

Scrape the hotel reviews of a whole city on TripAdvisor

36
PastebinScrapy
PastebinScrapy apurvsinghgautam Python

Threat hunting tool for scraping latest scrapes from Pastebin

36
Python-scraper-tutorial
Python-scraper-tutorial Decodo Python

A short introduction to scraping with Python with given steps and an example scraper script.

36
zoominfo_scraper
zoominfo_scraper ScrapingAnt Python

Zoominfo scraper with using of rotating proxies and headless Chrome from ScrapingAnt

36
facebook-discussion-tk
facebook-discussion-tk internaut Python

A collection of tools to (semi-)automatically collect and analyze data from online discussions on Facebook groups and pages.

35
api-flight.com
api-flight.com fgparamio HTML

Main API Flight Git Repository

35
jmd_imagescraper
jmd_imagescraper joedockrill Jupyter Notebook

Image scraping library for creating deep learning datasets

35
SneakerBot
SneakerBot mridulghanshala Python

Buy limited edition sneakers

35
chromedl
chromedl rusq Go

Go library for scraping or downloading files bypassing Cloudflare protection and browser checks

35
contact-use
contact-use browser-use HTML

✉️ Use the power of browser-use to contact any person or organization... by any means necessary

35
geetest-captcha-solver
geetest-captcha-solver ScraperBox-Github JavaScript

Solve the Geetest slider captcha with Puppeteer

35
node-red-contrib-nbrowser
node-red-contrib-nbrowser Steveorevo HTML

Provides a virtual web browser (a.k.a. "headless browser") appearing as a node.

34
n8n-ai-instagram-scraper
n8n-ai-instagram-scraper Peter-SB Python

Self hosted AI workflow for scraping Instagram Reels (audio and description). Extracting, summarising and categorising, then storing all relevant info...

34
scrapingai
scrapingai Agenty TypeScript

Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty

34
rebrowser-puppeteer-core
rebrowser-puppeteer-core rebrowser TypeScript

A drop-in replacement for puppeteer-core patched with rebrowser-patches. It allows to pass modern automation detection tests.

34
botasaurus-starter
botasaurus-starter omkarcloud TypeScript

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

34
fastProxy
fastProxy searchformyusername Python

MultiThreaded Application to Scrape Working Web Proxies

34
headless-task-server
headless-task-server luka-dev TypeScript

A headless browser task/job queue & runner based on Hero (Chrome)

34
FirstCyclingAPI
FirstCyclingAPI baronet2 Python

An unofficial Python API wrapper for firstcycling.com

34
ted-scraper
ted-scraper corralm Python

🎙️ TED Talks web scraper

34
israeli-supermarket-scarpers
israeli-supermarket-scarpers OpenIsraeliSupermarkets Jupyter Notebook

A python package with client to scrape the israeli supermarkets data

34
ioweb
ioweb lorien Python

Web Scraping Framework

33