A utility package for automating lighthouse reporting
🗿 npm ↔️ Algolia replication tool :skier: :snail: :artificial_satellite:
SpiderBox - 虫盒 - 爬虫逆向资源导航站
A new generation of multi-process async event-driven spider engine based on workerman. Support headless browser. 🌿基于workerman实现的多进程异步事件...
This is a course-downloader to help NTU students download courses data from NTU Ceiba.
微博爬虫,一个基于Scrapy框架的轻量微博爬虫,Sina Weibo Spider
🤖Free Agent Line Bot with Web Search, Google Image Search, Image Generator, Video Generator...
POOPAK - TOR Hidden Service Crawler
A simple library to scrape Pinterest images.
《数据采集从入门到放弃》源码。内容简介:爬虫介绍、就业情况、爬虫工程师面试题 ;HTTP协议介绍; Requests使用 ;解析器Xpath介绍; MongoDB与MySQL; 多线程...
✨ 萌萌哒的 AI 网页数据提取助手 ✨
Price tracker monitors of products and alerts you when prices drop. Supported tiki.vn, shopee, lotte.vn, ... Built with firebase https://pricetrack.we...
A php crawler that finds emails on the internets
Grabs all of the audio files from all of the Blinkist books
哔咔漫画收藏夹下载程序
scrapy专利爬虫(停止维护)
Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Leetcode Contest Ranking Searcher
Updated lists of IP addresses/whitelists of good bots and crawlers. Includes GoogleBot, BingBot, DuckDuckBot, etc.
宁稳网(旧富投网)、集思录可转债数据&策略分析
Python ProxyPool for web spider.ProxyPool 是一个用于采集、验证和管理代理IP的轻量级工具,旨在帮助用户自动维护高质量的代理池,方便在爬虫、网络请求中灵活...
Parse through any sitemap in Node.js
爬虫管理系统,支持集群,弹性伸缩。支持运行feapder、scrapy、selenium、playwright等各种框架及脚本
Parser and database to index the terpene profile of different strains of Cannabis from online databases
SimFin's open source PDF crawler
A news crawler for BBC News, Reuters and New York Times.
Multithreading download all HD photos / pictures from someone's Sina Weibo album.
Scraply a simple dom scraper to fetch information from any html based website
Viewers for statistics and dashboarding of Domain Search Engine data
大麦抢票脚本案例
This repository is no longer maintained.
A powerful MCP server extension providing web search and content extraction capabilities. Integrates DuckDuckGo search functionality and URL content e...
Attempts to crawl the Ethereum network of valid Ethereum execution nodes and visualizes them in a nice web dashboard.
An automated website accessibility scanner and cli
Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.
Enlarge training dataset by searching images with specified keywords in google and download the presented images
Download Videos Skill Share per ID or per Class
:computer: Quickly crawl the information (e.g. followers, tags, etc...) of an instagram profile. No login required!
java framework for prerender
保存百度贴吧帖子到本地,并且支持图片, 视频, 语音等内容。与本项目配套的阅读器 TiebaReader(https://github.com/Sorceresssis/TiebaReader)
GraphQuery is a query language and execution engine tied to any backend service.
Spiderbuf 是一个专注于 Python 爬虫练习的网站。提供丰富的爬虫教程、爬虫案例解析和爬虫练习题。Python爬虫开发强化练习,在矛与盾的攻防中不断提高技术水平,...
Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with...
某东商品价格监控:自定义商品价格,降价邮件/微信提醒。技术:Python爬虫/IP代理池/JS接口爬取/Selenium页面爬取
A dungeon crawler
微博数据采集,后续会加上知乎,贴吧,小红书,抖音,快手等主流媒体内容
百度贴吧吧务管理器✨删帖机✨使用aiohttp封装大量贴吧核心API
A self-hosted, extensible manga reader and download tool with plug-in support.
(已跑路)百度贴吧吧务管理工具,自动扫描帖子并处理违规帖