Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.
Scrape data from Goodreads using Scrapy and Selenium :books:
arxiv_miner is a toolkit for mining research papers on CS ArXiv.
Nim library for querying HTML using CSS-selectors (like JavaScripts document.querySelector)
Python library for automated email account creation. Create multiple accounts easily with support for major email providers.
ScraperAI is an open-source, AI-powered tool designed to simplify web scraping for users of all skill levels.
An Unofficial REST API for vlr.gg, a site for Valorant Pro Esports match results and news.
Scraping the TikTok discovery web API every 15 minutes using Github Actions to view changes
A console application to scrape a valid watching links for any movie or series with exact season and episode number, you can also download a whole sea...
Use AWS Lambda functions as a proxy pool to scrape web pages.
Library with a set of tools for scraping information about Nintendo games and its prices across all regions (NA, EU and JP).
The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, loc...
A fast, modern and intelligent proxy rotator perfect for crawling and scraping public data.
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
A web crawling framework implemented in Golang, it is simple to write and delivers powerful performance. It comes with a wide range of practical middl...
Simple library for exploring/scraping the web or testing a website you’re developing
Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!
Python framework to scrape Pastebin pastes and analyze them
📰 Build RSS 2.0 feeds from websites (and JSON APIs) automatically or with a few CSS selectors.
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
htmlSQL is a experimental PHP library which allows you to access HTML values by an SQL like syntax.
Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.
This tutorial shows how to automate your web scraping processes using AutoScaper – one of Python web scraping libraries available.
Machine learning for beginner(Data Science enthusiast)
A web crawling programming language
API ketersediaan rumah sakit dan tempat tidur rumah sakit untuk pasien covid-19 ataupun non-covid yang berada di Indonesia
Search geolocations for (social) media posts in databases like Bellingcat, Cen4InfoRes etc.
Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases:...
Scrapy + Puppeteer
An Telegram Mass Members Adding/Scraping Tool Written In Python Using Pyrogram Library.
Telegram CC Scrapper - Debit/Credit Card [channel public or private / group ]
A complimentary proxy to help to use SPM with headless browsers
Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.
ASP.NET View State Decoder
An app to search startup jobs scraped from websites written in Elixir, Phoenix, React and styled-components.
Parse a human name string into salutation, first name, middle name, last name, suffix.
Torrengo is a CLI (command line) program written in Go which concurrently searches torrents from various sources.
Turn any developer documentation into a GPT
Updated python-beginners docs and examples
Easily download anonymous Github repositories from https://anonymous.4open.science/ with a GUI interface
A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright...
🕵♂ Bot detection tests for Puppeteer. Hide and seek!
:spider: The PHP SERP Spider - A search engine scraper
OCaml functional web scraping library
PHP Library for detecting CMS
Automate your LinkedIn job applications with AI! This bot utilizes GPT models such as GPT-4, GPT-3.5, and Google's Gemini Pro for Easy Apply form fill...
爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer
Deals Scraper is a Canadian tool to find good deals on websites like Facebook Marketplace, Kijiji, Ebay, Amazon and Lespacs
A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2025