Most popular crawler repositories and open source projects

KamiYomu KamiYomu C#

A self-hosted, extensible manga reader and download tool with plug-in support.

121 7 121

learncpp-download amalrajan Python

Multi-threaded web scraper to download all the tutorials from www.learncpp.com and convert them to PDF files concurrently.

121 21 121

findpapers jonatasgrosman Python

Findpapers: A tool for helping researchers who are looking for related works

120 21 120

eyes r05323028 Python

Public Opinion Mining System of Taiwanese Forums

119 18 119

BaiduCrawler mazzzystar Python

Sample of using proxies to crawl baidu search results.

118 60 118

Lcrawl lndj PHP

一只优雅的正方教务系统爬虫。

117 44 117

proxy-pool denghuichao Java

爬虫代理IP池服务，可供其他爬虫程序通过restapi获取

116 57 116

ThesaurusSpider WuLC Python

下载搜狗、百度、QQ输入法的词库文件的 python 爬虫，可用于构建不同行业的词汇库

116 44 116

bots-zoo antoinevastel JavaScript

116 28 116

starfish-ql SeaQL Rust

✴️ An experimental graph database

115 3 115

gocrawler superjcd Go

gocrawler, go分布式爬虫框架

115 2 115

fuckBookWalker VermiIIi0n Python

Download books from bookwalker.jp/bookwalker.com.tw

114 10 114

goClone shurco Go

🌱 goClone - clone websites in seconds

113 9 113

sitemap-extract phase3dev Python

Processes XML sitemaps and extracts URLs. Includes features such as support for both plain XML and compressed XML files, multiple input sources, prote...

113 7 113

linkcrawler schollz Go

Cross-platform persistent and distributed web crawler :link:

113 9 113

APSoft-Web-Scanner-v2 ph09nix C#

Powerful dork searcher and vulnerability scanner for windows platform

112 35 112

bee-university beecost Python

Project thu thập điểm chuẩn đại học 2014 - 2018 và phân tích dữ liệu

111 24 111

pagser foolin Go

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

111 7 111

LFITester kostas-pa Python

LFITester is a Python3 program that automates the detection and exploitation of Local File Inclusion (LFI) vulnerabilities on a server.

111 26 111

scrapy-puppeteer clemfromspace Python

Scrapy + Puppeteer

110 29 110

wxpath rodricios Python

wxpath - declarative web crawling with XPath; a Web Query Language (WQL)

110 5 110

Proxy-List-Scrapper narkhedesam Python

Proxy List Scrapper

110 19 110

goscraper badoux Go

Golang pkg to quickly return a preview of a webpage (title/description/images)

109 41 109

crawlist WwwwwyDev Python

A universal solution for web crawling lists. 抓取网页列表的通用解决方案

109 1 109

antispider dytttf JavaScript

109 54 109

qcrawl crawlcore Python

qcrawl - fast async web crawling & scraping framework for Python.

109 5 109

zyte-smartproxy-headless-proxy zytedata Go

A complimentary proxy to help to use SPM with headless browsers

108 38 108

bose omkarcloud Python

✨ BOSE IS SWISS ARMY KNIFE 🔪 FOR BOT DEVELOPMENT. THE ULTIMATE BOT DEVELOPMENT FRAMEWORK. 🤖

107 1 107

crawler brantou Python

爬虫, http代理, 模拟登陆!

107 38 107

scrapai-cli discourselab Python

AI-powered web scraping CLI. Describe what you want, get a production-ready Scrapy spider. Write once, reuse forever.

107 11 107

asyncpy lixi5338619 Python

使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架

107 29 107

xcrawl3r hueristiq Go

A command-line utility designed to recursively spider webpages for URLs. It works by actively traversing websites - following links embedded in webpag...

107 8 107

weibo-scraper Xarrow Python

Simple Weibo Scraper

107 19 107

spider-py spider-rs Rust

Spider ported to Python

106 17 106

images-web-crawler amineHorseman Python

This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can cr...

106 24 106

aliexscrape ducdev JavaScript

Get Aliexpress product details in JSON

106 30 106

Crawler phantom-sea-limited Python

针对某亿些小说网站的爬虫

105 12 105

Scrapy_IPProxyPool monkey-soft Python

免费 IP 代理池。Scrapy 爬虫框架插件

105 40 105

pappet patrickschur JavaScript

A command-line tool to crawl websites using puppeteer.

105 7 105

CrawlerPack abola Java

Java 網路資料爬蟲包

104 69 104

jadwalsholatorg lakuapik Python

Parsed data from website https://jadwalsholat.org

103 33 103

COI AlvinAi96 Jupyter Notebook

练手项目：Comment of Interest 电商文本评论数据挖掘（爬虫 + 观点抽取 + 句子级和观点级情感分析）

103 16 103

random_user_agent Luqman-Ud-Din Python

A package to get list of user agents based on filters such as operating system, software name etc..

103 12 103

4scanner pboardman Python

Continuously search imageboards threads for images/webms and download them

103 18 103

google-arts-crawler piotrantosz Python

Google Arts & Culture high quality image downloader

102 19 102

devdocs-to-llm alexfazio Jupyter Notebook

Turn any developer documentation into a GPT

102 16 102

kameleo kameleo-io C#

Anti-detect browser for web scraping and automation. Engine-level fingerprint masking for Chromium and Firefox. Self-hosted, Docker-ready. Integrates...

102 22 102

webb hardikvasa Python

Python: An all-in-one Web Crawler, Web Parser and Web Scrapping library!

102 41 102

rewe-discounts foo-git Python

Grabs current REWE discounts and saves them in a markdown file || Holt sich aktuelle REWE-Angebote und exportiert sie in eine Markdown-Liste

102 10 102

dcard-spider leVirve Python

A spider on Dcard. Strong and speedy.

100 17 100

crawler

Repositories (1431)