Topic

scraping

Repositories (1766)

ScrapeMate
ScrapeMate hermit-crab JavaScript

Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.

127
TelegramAdderTool
TelegramAdderTool saifalisew1508 Python

An Telegram Mass Members Adding/Scraping Tool Written In Python Using Pyrogram Library.

126
FCaptcha
FCaptcha WebDecoy JavaScript

Detect bots, vision AI agents, and headless browsers through 40+ behavioral signals and SHA-256 proof of work. Self-hosted, privacy-first, and fully o...

126
htmlSQL
htmlSQL hxseven PHP

htmlSQL is a experimental PHP library which allows you to access HTML values by an SQL like syntax.

124
spiderbuf
spiderbuf hhuayuan Python

Spiderbuf 是一个专注于 Python 爬虫练习的网站。提供丰富的爬虫教程、爬虫案例解析和爬虫练习题。Python爬虫开发强化练习,在矛与盾的攻防中不断提高技术水平,...

124
viewstate
viewstate yuvadm Python

ASP.NET View State Decoder

122
scout-lang
scout-lang maxmindlin Rust

A web crawling programming language

122
linkedin-easyapply-using-AI
linkedin-easyapply-using-AI srikar-kodakandla Python

Automate your LinkedIn job applications with AI! This bot utilizes GPT models such as GPT-4, GPT-3.5, and Google's Gemini Pro for Easy Apply form fill...

122
automated-web-scraper-autoscraper
automated-web-scraper-autoscraper oxylabs

This tutorial shows how to automate your web scraping processes using AutoScaper – one of Python web scraping libraries available.

121
bots-zoo
bots-zoo antoinevastel JavaScript
116
open-australian-legal-corpus-creator
open-australian-legal-corpus-creator isaacus-dev Python

The code used to create and update the Open Australian Legal Corpus, the first and only multijurisdictional open corpus of Australian legislative and...

115
goClone
goClone shurco Go

🌱 goClone - clone websites in seconds

113
rs-bed-covid-indo-api
rs-bed-covid-indo-api satyawikananda TypeScript

API ketersediaan rumah sakit dan tempat tidur rumah sakit untuk pasien covid-19 ataupun non-covid yang berada di Indonesia

113
fansly-scraper
fansly-scraper agnosto Go

An all-in-one scraper/downloader for Fansly written in go with the aid of A.I. Download content, Record lives, and Interact with post from your favori...

113
scraper
scraper get-set-fetch TypeScript

Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases:...

113
media-search-engine
media-search-engine conflict-investigations Python

Search geolocations for (social) media posts in databases like Bellingcat, Cen4InfoRes etc.

112
public-roadmap
public-roadmap serpapi

Public Roadmap for SerpApi, LLC (https://serpapi.com)

112
abx-dl
abx-dl ArchiveBox Python

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless...

111
rust-scraping
rust-scraping itehax Rust

Web scraping using rust !

111
scrapy-puppeteer
scrapy-puppeteer clemfromspace Python

Scrapy + Puppeteer

110
qcrawl
qcrawl crawlcore Python

qcrawl - fast async web crawling & scraping framework for Python.

109
CC_Scrapper
CC_Scrapper AngelSecurityTeam Python

Telegram CC Scrapper - Debit/Credit Card [channel public or private / group ]

109
zyte-smartproxy-headless-proxy
zyte-smartproxy-headless-proxy zytedata Go

A complimentary proxy to help to use SPM with headless browsers

108
justetf-scraping
justetf-scraping druzsan Python

Scraping the justETF

107
torrengo
torrengo juliensalinas Go

Torrengo is a CLI (command line) program written in Go which concurrently searches torrents from various sources.

106
humanparser
humanparser ralyodio JavaScript

Parse a human name string into salutation, first name, middle name, last name, suffix.

105
rental-monitor
rental-monitor madebyarthouse TypeScript

Application for scraping, analyzing and visualizing rental listings in Austria

105
linkedin-bot
linkedin-bot FujiwaraChoki Python

Automate your LinkedIn Outreach with Selenium and GeckoDriver.

105
docudigger
docudigger Disane87 TypeScript

Website scraper for getting invoices automagically as pdf (useful for taxes or DMS)

103
puppeteer-botcheck
puppeteer-botcheck prescience-data TypeScript

🕵‍♂ Bot detection tests for Puppeteer. Hide and seek!

103
devdocs-to-llm
devdocs-to-llm alexfazio Jupyter Notebook

Turn any developer documentation into a GPT

102
KC-Scraper
KC-Scraper Kuucheen Python

A powerful open-source proxy scraper

102
kameleo
kameleo kameleo-io C#

Anti-detect browser for web scraping and automation. Engine-level fingerprint masking for Chromium and Firefox. Self-hosted, Docker-ready. Integrates...

102
job_search
job_search tsurupin Elixir

An app to search startup jobs scraped from websites written in Elixir, Phoenix, React and styled-components.

101
wreq-js
wreq-js sqdshguy TypeScript

HTTP client for Node.js with browser TLS fingerprint impersonation

101
Deals-Scraper
Deals-Scraper JustSxm Python

Deals Scraper is a Canadian tool to find good deals on websites like Facebook Marketplace, Kijiji, Ebay, Amazon and Lespacs

100
chatgpt-scraper-api
chatgpt-scraper-api ScrapingBee

Collect structured responses from a ChatGPT scraper by sending a prompt with valid ChatGPT scraping API credentials. Enable live search, inject HTML c...

99
shotstars
shotstars snooppr Python

An advanced tool for checking GitHub repositories, with star statistics, including fake star analysis and data visualization.

99
python-adv-web-apps
python-adv-web-apps macloo Python

Updated python-beginners docs and examples

99
rebrowser-playwright-python
rebrowser-playwright-python rebrowser Python

A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.

99
AyugeSpiderTools
AyugeSpiderTools shengchenyang Python

使 scrapy 开发不用在意 item,pipeline,middleware 等通用场景下模块的编写,解放开发者的双手。

98
browser-pool
browser-pool apify TypeScript

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright...

98
cinemaempoa
cinemaempoa cumbucadev CSS

Site que agrega filmes em cartaz em algumas das diversas salas de cinema de Porto Alegre.

97
crawler-chrome-extensions
crawler-chrome-extensions zkqiang

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

97
ebpf-web-fingerprint
ebpf-web-fingerprint robalb C

a golang library and webserver for fast TCP & TLS fingerprinting, powered by eBPF

97
clone-anonymous-github
clone-anonymous-github fedebotu Python

Easily download anonymous Github repositories from https://anonymous.4open.science/ with a GUI interface

97
core
core serp-spider PHP

:spider: The PHP SERP Spider - A search engine scraper

94
oxylabs-mcp
oxylabs-mcp oxylabs Python

Official Oxylabs MCP integration

94
moneyman
moneyman daniel-hauser TypeScript

Automatically save transactions from all major Israeli banks and credit card companies, using GitHub actions (or a self hosted docker image)

93
feedsearch-crawler
feedsearch-crawler DBeath Python

Crawl sites for RSS, Atom, and JSON feeds.

92