Topic

crawler

Repositories (1431)

KamiYomu
KamiYomu KamiYomu C#

A self-hosted, extensible manga reader and download tool with plug-in support.

121
learncpp-download
learncpp-download amalrajan Python

Multi-threaded web scraper to download all the tutorials from www.learncpp.com and convert them to PDF files concurrently.

121
findpapers
findpapers jonatasgrosman Python

Findpapers: A tool for helping researchers who are looking for related works

120
eyes
eyes r05323028 Python

Public Opinion Mining System of Taiwanese Forums

119
BaiduCrawler
BaiduCrawler mazzzystar Python

Sample of using proxies to crawl baidu search results.

118
Lcrawl
Lcrawl lndj PHP

一只优雅的正方教务系统爬虫。

117
proxy-pool
proxy-pool denghuichao Java

爬虫代理IP池服务,可供其他爬虫程序通过restapi获取

116
ThesaurusSpider
ThesaurusSpider WuLC Python

下载搜狗、百度、QQ输入法的词库文件的 python 爬虫,可用于构建不同行业的词汇库

116
bots-zoo
bots-zoo antoinevastel JavaScript
116
starfish-ql
starfish-ql SeaQL Rust

✴️ An experimental graph database

115
gocrawler
gocrawler superjcd Go

gocrawler, go分布式爬虫框架

115
fuckBookWalker
fuckBookWalker VermiIIi0n Python

Download books from bookwalker.jp/bookwalker.com.tw

114
goClone
goClone shurco Go

🌱 goClone - clone websites in seconds

113
sitemap-extract
sitemap-extract phase3dev Python

Processes XML sitemaps and extracts URLs. Includes features such as support for both plain XML and compressed XML files, multiple input sources, prote...

113
linkcrawler
linkcrawler schollz Go

Cross-platform persistent and distributed web crawler :link:

113
APSoft-Web-Scanner-v2
APSoft-Web-Scanner-v2 ph09nix C#

Powerful dork searcher and vulnerability scanner for windows platform

112
bee-university
bee-university beecost Python

Project thu thập điểm chuẩn đại học 2014 - 2018 và phân tích dữ liệu

111
pagser
pagser foolin Go

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

111
LFITester
LFITester kostas-pa Python

LFITester is a Python3 program that automates the detection and exploitation of Local File Inclusion (LFI) vulnerabilities on a server.

111
scrapy-puppeteer
scrapy-puppeteer clemfromspace Python

Scrapy + Puppeteer

110
wxpath
wxpath rodricios Python

wxpath - declarative web crawling with XPath; a Web Query Language (WQL)

110
Proxy-List-Scrapper
Proxy-List-Scrapper narkhedesam Python

Proxy List Scrapper

110
goscraper
goscraper badoux Go

Golang pkg to quickly return a preview of a webpage (title/description/images)

109
crawlist
crawlist WwwwwyDev Python

A universal solution for web crawling lists. 抓取网页列表的通用解决方案

109
antispider
antispider dytttf JavaScript
109
qcrawl
qcrawl crawlcore Python

qcrawl - fast async web crawling & scraping framework for Python.

109
zyte-smartproxy-headless-proxy
zyte-smartproxy-headless-proxy zytedata Go

A complimentary proxy to help to use SPM with headless browsers

108
bose
bose omkarcloud Python

✨ BOSE IS SWISS ARMY KNIFE 🔪 FOR BOT DEVELOPMENT. THE ULTIMATE BOT DEVELOPMENT FRAMEWORK. 🤖

107
crawler
crawler brantou Python

爬虫, http代理, 模拟登陆!

107
scrapai-cli
scrapai-cli discourselab Python

AI-powered web scraping CLI. Describe what you want, get a production-ready Scrapy spider. Write once, reuse forever.

107
asyncpy
asyncpy lixi5338619 Python

使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架

107
xcrawl3r
xcrawl3r hueristiq Go

A command-line utility designed to recursively spider webpages for URLs. It works by actively traversing websites - following links embedded in webpag...

107
weibo-scraper
weibo-scraper Xarrow Python

Simple Weibo Scraper

107
spider-py
spider-py spider-rs Rust

Spider ported to Python

106
images-web-crawler
images-web-crawler amineHorseman Python

This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can cr...

106
aliexscrape
aliexscrape ducdev JavaScript

Get Aliexpress product details in JSON

106
Crawler
Crawler phantom-sea-limited Python

针对某亿些小说网站的爬虫

105
Scrapy_IPProxyPool
Scrapy_IPProxyPool monkey-soft Python

免费 IP 代理池。Scrapy 爬虫框架插件

105
pappet
pappet patrickschur JavaScript

A command-line tool to crawl websites using puppeteer.

105
CrawlerPack
CrawlerPack abola Java

Java 網路資料爬蟲包

104
jadwalsholatorg
jadwalsholatorg lakuapik Python

Parsed data from website https://jadwalsholat.org

103
COI
COI AlvinAi96 Jupyter Notebook

练手项目:Comment of Interest 电商文本评论数据挖掘 (爬虫 + 观点抽取 + 句子级和观点级情感分析)

103
random_user_agent
random_user_agent Luqman-Ud-Din Python

A package to get list of user agents based on filters such as operating system, software name etc..

103
4scanner
4scanner pboardman Python

Continuously search imageboards threads for images/webms and download them

103
google-arts-crawler
google-arts-crawler piotrantosz Python

Google Arts & Culture high quality image downloader

102
devdocs-to-llm
devdocs-to-llm alexfazio Jupyter Notebook

Turn any developer documentation into a GPT

102
kameleo
kameleo kameleo-io C#

Anti-detect browser for web scraping and automation. Engine-level fingerprint masking for Chromium and Firefox. Self-hosted, Docker-ready. Integrates...

102
webb
webb hardikvasa Python

Python: An all-in-one Web Crawler, Web Parser and Web Scrapping library!

102
rewe-discounts
rewe-discounts foo-git Python

Grabs current REWE discounts and saves them in a markdown file || Holt sich aktuelle REWE-Angebote und exportiert sie in eine Markdown-Liste

102
dcard-spider
dcard-spider leVirve Python

A spider on Dcard. Strong and speedy.

100