Stop stalking and start StopStalking :wink:
Lightweight web scraping toolkit for documents and structured data.
🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
一个灵活、友好的爬虫框架
An Instagram bot developed using the Selenium Framework
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Laravel adapter for Roach, the complete web scraping toolkit for PHP.
Create you virus in termux!
Experience for effectively fetching Facebook data by Querying Graph API with Account-based Token and Operating undetectable scraping Bots to extract C...
네이버 뉴스 수집을 위한 도구
Automated web scraping spider generation using Browser Use and LLMs. Streamline the creation of Playwright-based spiders with minimal manual coding. I...
Crawler for linguistic corpora
Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them...
estela, an elastic web scraping cluster 🕸
SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library des...
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
Download a large list of files concurrently
Go process used to crawl websites
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
cdp4j - Chrome DevTools Protocol for Java
A fast, modern and intelligent proxy rotator perfect for crawling and scraping public data.
A test suite of common scraper detection techniques. See how detectable your scraper stack is.
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Parse through any sitemap in Node.js
SimFin's open source PDF crawler
Scraply a simple dom scraper to fetch information from any html based website
A simple Python script to crawl complete list of LinkedIn skills
An asyncio + aiolibs crawler imitate scrapy framework
使用 Scrapy 写成的 JK 爬虫,图片源自哔哩哔哩、Tumblr、Instagram,以及微博、Twitter
🗄️ A simple CLI for converting WARC to Parquet.
🌱 goClone - clone websites in seconds
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless...
qcrawl - fast async web crawling & scraping framework for Python.
Burp Suite's extension to scan and crawl Single Page Applications
Download DIG to run on your laptop or server.
Turn any developer documentation into a GPT
使 scrapy 开发不用在意 item,pipeline,middleware 等通用场景下模块的编写,解放开发者的双手。
Fast, highly configurable, cloud native dark web crawler.
Crawl sites for RSS, Atom, and JSON feeds.
Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.
ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different website...
Powerful web scraping framework for Crystal
Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.
Web crawling and document processing through a usable interface.
RAG Web Browser is an Apify Actor to feed your LLM applications and RAG pipelines with up-to-date text content scraped from the web.
Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code