一个灵活、友好的爬虫框架
An Instagram bot developed using the Selenium Framework
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Laravel adapter for Roach, the complete web scraping toolkit for PHP.
Create you virus in termux!
네이버 뉴스 수집을 위한 도구
Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them...
Crawler for linguistic corpora
Experience for effectively fetching Facebook data by Querying Graph API with Account-based Token and Operating undetectable scraping Bots to extract C...
estela, an elastic web scraping cluster 🕸
SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library des...
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Download a large list of files concurrently
Go process used to crawl websites
cdp4j - Chrome DevTools Protocol for Java
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
A test suite of common scraper detection techniques. See how detectable your scraper stack is.
Scraply a simple dom scraper to fetch information from any html based website
A fast, modern and intelligent proxy rotator perfect for crawling and scraping public data.
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
SimFin's open source PDF crawler
An asyncio + aiolibs crawler imitate scrapy framework
Parse through any sitemap in Node.js
A simple Python script to crawl complete list of LinkedIn skills
使用 Scrapy 写成的 JK 爬虫,图片源自哔哩哔哩、Tumblr、Instagram,以及微博、Twitter
🗄️ A simple CLI for converting WARC to Parquet.
Burp Suite's extension to scan and crawl Single Page Applications
Download DIG to run on your laptop or server.
使 scrapy 开发不用在意 item,pipeline,middleware 等通用场景下模块的编写,解放开发者的双手。
Turn any developer documentation into a GPT
Fast, highly configurable, cloud native dark web crawler.
ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different website...
Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.
Automated web scraping spider generation using Browser Use and LLMs. Streamline the creation of Playwright-based spiders with minimal manual coding. I...
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless...
Powerful web scraping framework for Crystal
Crawl sites for RSS, Atom, and JSON feeds.
🌱 goClone - clone websites in seconds
Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.
Web crawling and document processing through a usable interface.
Python crawling tutorial
A simple and easy to use web crawler for Python
Web scraping and automation using python
🌌 High productivity semi-automatic crawler generator 🛠️🧰
A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy...
Screen scraping and web crawling framework