Topic

scraping

Repositories (1626)

ctenopharyngodon-idella
ctenopharyngodon-idella touero Java

Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.

140
GoodreadsScraper
GoodreadsScraper havanagrawal Python

Scrape data from Goodreads using Scrapy and Selenium :books:

140
arxiv-miner
arxiv-miner valayDave Python

arxiv_miner is a toolkit for mining research papers on CS ArXiv.

134
nimquery
nimquery GULPF Nim

Nim library for querying HTML using CSS-selectors (like JavaScripts document.querySelector)

134
ninjemail
ninjemail david96182 Python

Python library for automated email account creation. Create multiple accounts easily with support for major email providers.

133
scraperai
scraperai scraperai HTML

ScraperAI is an open-source, AI-powered tool designed to simplify web scraping for users of all skill levels.

133
vlrggapi
vlrggapi axsddlr Python

An Unofficial REST API for vlr.gg, a site for Valorant Pro Esports match results and news.

133
tiktok-trending-data
tiktok-trending-data antiops

Scraping the TikTok discovery web API every 15 minutes using Github Actions to view changes

132
Movies-and-Series-Scraper
Movies-and-Series-Scraper yousefkotp Python

A console application to scrape a valid watching links for any movie or series with exact season and episode number, you can also download a whole sea...

130
lambda-scraper
lambda-scraper teticio JavaScript

Use AWS Lambda functions as a proxy pool to scrape web pages.

130
nintendeals
nintendeals fedecalendino Python

Library with a set of tools for scraping information about Nintendo games and its prices across all regions (NA, EU and JP).

129
apify-sdk-python
apify-sdk-python apify Python

The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, loc...

129
proxifier
proxifier rookmoot Go

A fast, modern and intelligent proxy rotator perfect for crawling and scraping public data.

128
seleniumcrawler
seleniumcrawler voliveirajr Python

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site

127
go-crawler
go-crawler lizongying Go

A web crawling framework implemented in Golang, it is simple to write and delivers powerful performance. It comes with a wide range of practical middl...

127
robox
robox danclaudiupop Python

Simple library for exploring/scraping the web or testing a website you’re developing

127
Instagram-to-discord
Instagram-to-discord fernandod1 Python

Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!

126
pastepwn
pastepwn d-Rickyy-b Python

Python framework to scrape Pastebin pastes and analyze them

125
html2rss
html2rss html2rss Ruby

📰 Build RSS 2.0 feeds from websites (and JSON APIs) automatically or with a few CSS selectors.

125
wget-lua
wget-lua ArchiveTeam C

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

125
htmlSQL
htmlSQL hxseven PHP

htmlSQL is a experimental PHP library which allows you to access HTML values by an SQL like syntax.

124
WebReaper
WebReaper pavlovtech C#

Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.

119
automated-web-scraper-autoscraper
automated-web-scraper-autoscraper oxylabs

This tutorial shows how to automate your web scraping processes using AutoScaper – one of Python web scraping libraries available.

116
MachineLearning
MachineLearning yug95 Jupyter Notebook

Machine learning for beginner(Data Science enthusiast)

115
bots-zoo
bots-zoo antoinevastel JavaScript
115
scout-lang
scout-lang maxmindlin Rust

A web crawling programming language

113
rs-bed-covid-indo-api
rs-bed-covid-indo-api satyawikananda TypeScript

API ketersediaan rumah sakit dan tempat tidur rumah sakit untuk pasien covid-19 ataupun non-covid yang berada di Indonesia

112
media-search-engine
media-search-engine conflict-investigations Python

Search geolocations for (social) media posts in databases like Bellingcat, Cen4InfoRes etc.

111
scraper
scraper get-set-fetch TypeScript

Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases:...

111
scrapy-puppeteer
scrapy-puppeteer clemfromspace Python

Scrapy + Puppeteer

111
TelegramAdderTool
TelegramAdderTool saifalisew1508 Python

An Telegram Mass Members Adding/Scraping Tool Written In Python Using Pyrogram Library.

110
CC_Scrapper
CC_Scrapper AngelSecurityTeam Python

Telegram CC Scrapper - Debit/Credit Card [channel public or private / group ]

109
zyte-smartproxy-headless-proxy
zyte-smartproxy-headless-proxy zytedata Go

A complimentary proxy to help to use SPM with headless browsers

108
ScrapeMate
ScrapeMate hermit-crab JavaScript

Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.

105
viewstate
viewstate yuvadm Python

ASP.NET View State Decoder

105
job_search
job_search tsurupin Elixir

An app to search startup jobs scraped from websites written in Elixir, Phoenix, React and styled-components.

103
humanparser
humanparser ralyodio JavaScript

Parse a human name string into salutation, first name, middle name, last name, suffix.

102
torrengo
torrengo juliensalinas Go

Torrengo is a CLI (command line) program written in Go which concurrently searches torrents from various sources.

100
devdocs-to-llm
devdocs-to-llm alexfazio Jupyter Notebook

Turn any developer documentation into a GPT

98
python-adv-web-apps
python-adv-web-apps macloo Python

Updated python-beginners docs and examples

97
clone-anonymous-github
clone-anonymous-github fedebotu Python

Easily download anonymous Github repositories from https://anonymous.4open.science/ with a GUI interface

96
browser-pool
browser-pool apify TypeScript

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright...

94
puppeteer-botcheck
puppeteer-botcheck prescience-data TypeScript

🕵‍♂ Bot detection tests for Puppeteer. Hide and seek!

93
core
core serp-spider PHP

:spider: The PHP SERP Spider - A search engine scraper

91
mechaml
mechaml yannham OCaml

OCaml functional web scraping library

91
Detect-CMS
Detect-CMS Krisseck PHP

PHP Library for detecting CMS

90
linkedin-easyapply-using-AI
linkedin-easyapply-using-AI srikar-kodakandla Python

Automate your LinkedIn job applications with AI! This bot utilizes GPT models such as GPT-4, GPT-3.5, and Google's Gemini Pro for Easy Apply form fill...

89
crawler-chrome-extensions
crawler-chrome-extensions zkqiang

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

89
Deals-Scraper
Deals-Scraper JustSxm Python

Deals Scraper is a Canadian tool to find good deals on websites like Facebook Marketplace, Kijiji, Ebay, Amazon and Lespacs

89
instagram-media-scraper
instagram-media-scraper ahmedrangel JavaScript

A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2025

89