Most popular crawler repositories and open source projects

awesome-chinese-law XiaomingX

一个网络安全法律法规、安全政策、国家标准、行业标准知识库。A knowledge base of cybersecurity laws and regulations, security policies, national standard...

99 6 99

Weibo-Album-Crawler Lodour Python

A multiprocessing crawler for weibo albums.

99 33 99

deepweb-scappering kurogai Python

Discover hidden deepweb pages

98 20 98

AyugeSpiderTools shengchenyang Python

使 scrapy 开发不用在意 item，pipeline，middleware 等通用场景下模块的编写，解放开发者的双手。

98 16 98

google-maps-scraper omkarcloud Python

👋 HOLA! ENJOY OUR GOOGLE MAPS SCRAPER 🚀 TO EFFORTLESSLY EXTRACT DATA SUCH AS NAMES, ADDRESSES, PHONE NUMBERS, WEBSITES, AND RATINGS FROM GOOGLE MAPS...

98 14 98

crawler-chrome-extensions zkqiang

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

97 16 97

scaleable-crawler-with-docker-cluster tonywangcn Python

a scaleable and efficient crawelr with docker cluster , crawl million pages in 2 hours with a single machine

97 27 97

es6-crawler-detect JefferyHus TypeScript

:spider: This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragen...

96 30 96

ManyACG krau Go

Collect, Download, Organize and Share your Favorite Anime Artworks.

96 6 96

pku3b sshwy Rust

🎓a Better BlackBoard for PKUers. 北京大学教学网命令行工具（🖥️Win/🐧Linux/🍏Mac）, 支持查看/提交作业、下载课程回放.

96 14 96

bathyscaphe creekorful Go

Fast, highly configurable, cloud native dark web crawler.

95 21 95

CrawlAI-RAG AnkitNayak-eth Python

CrawlAI RAG is an AI-powered website intelligence platform that allows users to crawl entire websites, index their content, and ask natural-language q...

95 21 95

Taiwan-news-crawlers TaiwanStat Python

Scrapy-based Crawlers for news of Taiwan

95 16 95

gopa-abandoned medcl Go

GOPA, a spider written in Go.（NOTE: this project moved to https://github.com/infinitbyte/gopa ）

94 30 94

the-great-gpt-firewall samber Python

🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs

93 7 93

lianjia-eroom-analysis linpingta Python

lianjia / beike estate crawler/analysis 2024

93 33 93

BUbiNG LAW-Unimi Java

The LAW next generation crawler.

92 23 92

feedsearch-crawler DBeath Python

Crawl sites for RSS, Atom, and JSON feeds.

92 15 92

python-tools lucasayres Python

A collection of Python tools, scripts and utilities to make your life easier.

92 19 92

MediaCrawler RaidenEI21 Python

MediaCrawler is a powerful web scraper for self-media platforms. Easily collect and analyze content to enhance your digital strategy. 🌐🕷️

92 8 92

Novel-crawler ling7334 Python

这是一个用Python写的小说爬虫软件

91 27 91

Amazon-Price-Alert GaryniL Python

Price tracker of Amazon

91 27 91

chinese-holidays-calendar muhac Haskell

Calendar of Public Holidays in China 中国大陆节假日日历订阅自动节假日闹钟

91 6 91

BiLiBiLi_DanMu_Crawling HengXin666 TypeScript

爬取B站历史弹幕/全弹幕, 支持高级弹幕, Bas弹幕爬取. [2025年]可用; 内部爬取算法可以在最优最少请求次数下爬取弹幕, 并且不会丢失任何弹幕. 支持多任务管...

91 8 91

MedicalKG yeeeqichen Python

医疗知识图谱构建实战，通过爬虫获取百度百科数据，使用Mongodb存储结构化三元组，并使用neo4j进行知识图谱的构建及可视化; Medical Knowledge Graph; Crawler;...

90 15 90

crawlie nietaki Elixir

A simple Elixir library for writing decently-performing crawlers with minimum effort.

90 11 90

Pinterest-infinite-crawler mirusu400 Python

An infinite Pinterest crawler/scraper. Crawl image with inifnite-scroll!

90 18 90

SeleniumDemo tobecrazy HTML

Selenium automation test framework

89 95 89

movie-elasticsearch cbwleft Java

使用 SpringBoot2.0+ElasticSearch 实现的开源电影搜索引擎

89 34 89

twitter_user_tweet_crawler kaixinol Python

A Python crawler tool that can automatically simulate browser operations to crawl all users' tweet content and save all static resources (videos, pict...

89 16 89

HydraRecon aufzayed Python

All In One, Fast, Easy Recon Tool

88 14 88

html-table-extractor yuanxu-li Python

extract data from html table

88 22 88

shopify-spy ndgigliotti Python

Extract structured data from Shopify websites.

88 47 88

WebScraper MLArtist Python

Python-based web crawling script with randomized intervals, user-agent rotation, and proxy server IP rotation to outsmart website bots and prevent blo...

88 22 88

scrapeGPT LexiestLeszek Python

ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Re...

87 15 87

firecrawl-py firecrawl Python

Crawl and convert any website into clean markdown

87 9 87

ICLR2023-OpenReviewData fedebotu Jupyter Notebook

Crawl & Visualize ICLR 2023 Data from OpenReview

87 12 87

narr IljaN Go

Download audio tracks from Netflix to sample your favorite shows

87 10 87

WebSecurityArticles zongdeiqianxing Python

爬取及整理Freebuf\安全客\先知\知道创宇等站点的”web安全“类优质文章

87 20 87

Bilibili_manga_download Randark-JMT Python

带图形界面的哔哩哔哩漫画下载工具

86 2 86

scrapy_helper facert CSS

Dynamic configurable crawl (动态可配置化爬虫)

86 30 86

webspot crawlab-team Python

An intelligent web service to automatically detect web content and extract information from it.

86 13 86

extension get-set-fetch TypeScript

web scraping extension

85 7 85

shopify-app-store-scraper usernam3 Python

Crawler behind the Shopify App Marketplace dataset

84 29 84

is-google roccomuso JavaScript

Verify that a request is from Google crawlers using Google's DNS verification steps

84 7 84

GMaps-Crawler guilatrova Python

Google Maps crawler using Selenium. All extracted data is forwarded to a SQS queue.

84 26 84

skweez edermi Go

Fast website scraper and wordlist generator

84 6 84

metacritic_api melroy89 PHP

PHP Metacritic API - Mirror from my GitLab

83 18 83

Hands-on-WebScraping superryeti Python

This repo is a part of blog series on several web scraping projects where we will explore scraping techniques to crawl data from simple websites to we...

83 70 83

ParseHub z-mio Python

轻量、异步、开箱即用的社交媒体聚合解析库

83 8 83

crawler

Repositories (1431)