Topic

scraping

Repositories (1626)

myanimelist-data-set-creator
myanimelist-data-set-creator debakarr Python

Collection of some simple python scripts to create https://myanimelist.net/ anime and user data set.

42
youtube-comment-scraper
youtube-comment-scraper ahmedshahriar Jupyter Notebook

This script will dump youtube video comments to a CSV from youtube video links. Video links can be placed inside a variable or list or CSV

42
flutter_notification_listener
flutter_notification_listener jiusanzhou Kotlin

Flutter plugin to listen for and interact with all incoming notifications for Android. 一个监听手机通知的插件。

42
goGetJS
goGetJS davemolk Go

a tool for extracting, searching, and saving JavaScript files (with optional headless browser)

42
Upwork-AI-jobs-applier
Upwork-AI-jobs-applier kaymen99 Python

AI tool for automating Upwork job applications using AI agents to find and qualify jobs, write personalized cover letters, and prepare for interviews...

42
movie-posters-convnet
movie-posters-convnet adrz Python

Unsupervised clustering of movie posters with features extracted from Convolutional Neural Network

41
Architeuthis
Architeuthis simon987 Go

MITM HTTP(S) proxy with integrated load-balancing, rate-limiting and error handling. Built for automated web scraping.

41
html-table-to-json
html-table-to-json brandon93s JavaScript

Generate JSON representations of HTML tables

41
noscrape
noscrape schoenbergerb TypeScript

This repository is deprecated

41
webtranspose
webtranspose mike-gee Python

Web scraping API for building AI applications.

41
python-scrapfly
python-scrapfly scrapfly Python

Scrapfly Python SDK for headless browsers and proxy rotation

41
TikDown
TikDown xtekky Python

Fast TikTok NO Watermark Video Downloader (username or url)

41
Extracty
Extracty Mamdouh66 Python

Extract structured data from any unstructured web page

40
lc-webscraping
lc-webscraping carpentries-incubator Python

Introduction to web scraping

40
shup
shup pystardust Shell

A POSIX shell script to parse HTML

40
TorScrapper
TorScrapper little-endian-0x01 Python

A Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)

40
linkeBot
linkeBot fabiodeandrade HTML

🔎 um bot de Web Scraping para mostrar vagas do LinkedIn

40
scrapingant-client-python
scrapingant-client-python ScrapingAnt Python

ScrapingAnt API client for Python.

40
pyplexity
pyplexity citiususc Python

Cleaning tool for web scraped text

39
linkedin-scraper
linkedin-scraper akramaznakour JavaScript

Enhanced LinkedIn Job Search Chrome Extension

39
fulldom-server
fulldom-server strugee JavaScript

Proxy-like server that will show you the DOM of a page after JS runs

38
extract-social-media
extract-social-media fluquid Python

Extract social media links and account names from websites.

38
Whatsapp-Scraper
Whatsapp-Scraper In-vincible Python

Scraps all the open chats, and their last n messages, and saves them in a csv file

38
async-pubmed-scraper
async-pubmed-scraper IliaZenkov Python

PubMed scraper for async search on a list of keywords and concurrent extraction of all found URLs, returning a DataFrame/CSV containing all article da...

38
scrapy-scrapingbee
scrapy-scrapingbee ScrapingBee Python

JavaScript support and proxy rotation for Scrapy with ScrapingBee.

38
etf4u
etf4u leoncvlt Python

📊 Python tool to scrape real-time information about ETFs from the web and mixing them together by proportionally distributing their assets allocation

38
scrapy-zyte-api
scrapy-zyte-api scrapy-plugins Python

Zyte API integration for Scrapy

38
CobWeb-lnx
CobWeb-lnx GoncaloMark Python

CobWeb is a Python library for web scraping. The library consists of two classes: Spider and Scraper.

38
fake-http-header
fake-http-header MichaelTatarski Python

A python package to generate random request fields for a http header.

38
Rotating-Proxies-With-Python
Rotating-Proxies-With-Python oxylabs Python

Learn about how to rotate proxies by using Python.

38
papercut
papercut armand1m TypeScript

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Cachin...

38
tvseries
tvseries athityakumar HTML

TV Series is a tool that scrapes Episode Synopsis' of popular TV Series' from websites like Wikipedia / IMDb and show in one place with a user-friendl...

37
sneakpeek
sneakpeek flulemon Python

Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex sc...

37
gopher-parse-sitemap
gopher-parse-sitemap oxffaa Go

A high effective golang library for parsing big-sized sitemaps and avoiding high memory usage. The sitemap parser was written on golang without extern...

37
gwaripper
gwaripper nilfoer Python

Tool for conveniently downloading audios from r/gonewildaudio and similar subreddits

37
chirps
chirps schedutron Python

Twitter bot powering @arichduvet

36
freesoccer
freesoccer andrelmlins TypeScript

:soccer: Free API with results from national soccer competitions

36
google-scraper
google-scraper samaybhavsar PHP

This class can retrieve search results from Google.

36
InstaBot
InstaBot drbuche Python

Simple and friendly Bot for Instagram, using Selenium and Scrapy with Python.

36
mangahook-api
mangahook-api kiraaziz JavaScript

free open source manga api , including fetch all manga , single manga also support search . beside od next js demo .

36
raiplay-dl
raiplay-dl wetcork Python

The most advanced raiplay.it downloader

36
geetest-captcha-solver
geetest-captcha-solver ScraperBox-Github JavaScript

Solve the Geetest slider captcha with Puppeteer

36
puppeteer-humanize
puppeteer-humanize force-adverse TypeScript

🕺 Humanizer functions for Puppeteer

36
scrapeops-scrapy-sdk
scrapeops-scrapy-sdk ScrapeOps Python

Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the box.

36
webradio-metadata
webradio-metadata adblockradio JavaScript

Collection of scraping recipes to get metadata about what is being streamed on webradios

35
poketo
poketo poketo JavaScript

Node library for scraping manga sites

35
node-red-contrib-nbrowser
node-red-contrib-nbrowser Steveorevo HTML

Provides a virtual web browser (a.k.a. "headless browser") appearing as a node.

35
dilbert-viewer
dilbert-viewer rharish101 Rust

A simple comic viewer for Dilbert by Scott Adams

35
policy-data-analyzer
policy-data-analyzer wri-dssg-omdena Jupyter Notebook

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the...

35
SneakerBot
SneakerBot mridulghanshala Python

Buy limited edition sneakers

35