Topic

crawler

Repositories (1431)

awesome-chinese-law
awesome-chinese-law XiaomingX

一个网络安全法律法规、安全政策、国家标准、行业标准知识库。A knowledge base of cybersecurity laws and regulations, security policies, national standard...

99
Weibo-Album-Crawler
Weibo-Album-Crawler Lodour Python

A multiprocessing crawler for weibo albums.

99
deepweb-scappering
deepweb-scappering kurogai Python

Discover hidden deepweb pages

98
AyugeSpiderTools
AyugeSpiderTools shengchenyang Python

使 scrapy 开发不用在意 item,pipeline,middleware 等通用场景下模块的编写,解放开发者的双手。

98
google-maps-scraper
google-maps-scraper omkarcloud Python

👋 HOLA! ENJOY OUR GOOGLE MAPS SCRAPER 🚀 TO EFFORTLESSLY EXTRACT DATA SUCH AS NAMES, ADDRESSES, PHONE NUMBERS, WEBSITES, AND RATINGS FROM GOOGLE MAPS...

98
crawler-chrome-extensions
crawler-chrome-extensions zkqiang

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

97
scaleable-crawler-with-docker-cluster
scaleable-crawler-with-docker-cluster tonywangcn Python

a scaleable and efficient crawelr with docker cluster , crawl million pages in 2 hours with a single machine

97
es6-crawler-detect
es6-crawler-detect JefferyHus TypeScript

:spider: This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragen...

96
ManyACG
ManyACG krau Go

Collect, Download, Organize and Share your Favorite Anime Artworks.

96
pku3b
pku3b sshwy Rust

🎓a Better BlackBoard for PKUers. 北京大学教学网命令行工具(🖥️Win/🐧Linux/🍏Mac), 支持查看/提交作业、下载课程回放.

96
bathyscaphe
bathyscaphe creekorful Go

Fast, highly configurable, cloud native dark web crawler.

95
CrawlAI-RAG
CrawlAI-RAG AnkitNayak-eth Python

CrawlAI RAG is an AI-powered website intelligence platform that allows users to crawl entire websites, index their content, and ask natural-language q...

95
Taiwan-news-crawlers
Taiwan-news-crawlers TaiwanStat Python

Scrapy-based Crawlers for news of Taiwan

95
gopa-abandoned
gopa-abandoned medcl Go

GOPA, a spider written in Go.(NOTE: this project moved to https://github.com/infinitbyte/gopa )

94
the-great-gpt-firewall
the-great-gpt-firewall samber Python

🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs

93
lianjia-eroom-analysis
lianjia-eroom-analysis linpingta Python

lianjia / beike estate crawler/analysis 2024

93
BUbiNG
BUbiNG LAW-Unimi Java

The LAW next generation crawler.

92
feedsearch-crawler
feedsearch-crawler DBeath Python

Crawl sites for RSS, Atom, and JSON feeds.

92
python-tools
python-tools lucasayres Python

A collection of Python tools, scripts and utilities to make your life easier.

92
MediaCrawler
MediaCrawler RaidenEI21 Python

MediaCrawler is a powerful web scraper for self-media platforms. Easily collect and analyze content to enhance your digital strategy. 🌐🕷️

92
Novel-crawler
Novel-crawler ling7334 Python

这是一个用Python写的小说爬虫软件

91
Amazon-Price-Alert
Amazon-Price-Alert GaryniL Python

Price tracker of Amazon

91
chinese-holidays-calendar
chinese-holidays-calendar muhac Haskell

Calendar of Public Holidays in China 中国大陆节假日日历订阅 自动节假日闹钟

91
BiLiBiLi_DanMu_Crawling
BiLiBiLi_DanMu_Crawling HengXin666 TypeScript

爬取B站历史弹幕/全弹幕, 支持高级弹幕, Bas弹幕爬取. [2025年]可用; 内部爬取算法可以在 最优最少 请求次数下爬取弹幕, 并且 不会 丢失任何弹幕. 支持多任务管...

91
MedicalKG
MedicalKG yeeeqichen Python

医疗知识图谱构建实战,通过爬虫获取百度百科数据,使用Mongodb存储结构化三元组,并使用neo4j进行知识图谱的构建及可视化; Medical Knowledge Graph; Crawler;...

90
crawlie
crawlie nietaki Elixir

A simple Elixir library for writing decently-performing crawlers with minimum effort.

90
Pinterest-infinite-crawler
Pinterest-infinite-crawler mirusu400 Python

An infinite Pinterest crawler/scraper. Crawl image with inifnite-scroll!

90
SeleniumDemo
SeleniumDemo tobecrazy HTML

Selenium automation test framework

89
movie-elasticsearch
movie-elasticsearch cbwleft Java

使用 SpringBoot2.0+ElasticSearch 实现的开源电影搜索引擎

89
twitter_user_tweet_crawler
twitter_user_tweet_crawler kaixinol Python

A Python crawler tool that can automatically simulate browser operations to crawl all users' tweet content and save all static resources (videos, pict...

89
HydraRecon
HydraRecon aufzayed Python

All In One, Fast, Easy Recon Tool

88
html-table-extractor
html-table-extractor yuanxu-li Python

extract data from html table

88
shopify-spy
shopify-spy ndgigliotti Python

Extract structured data from Shopify websites.

88
WebScraper
WebScraper MLArtist Python

Python-based web crawling script with randomized intervals, user-agent rotation, and proxy server IP rotation to outsmart website bots and prevent blo...

88
scrapeGPT
scrapeGPT LexiestLeszek Python

ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Re...

87
firecrawl-py
firecrawl-py firecrawl Python

Crawl and convert any website into clean markdown

87
ICLR2023-OpenReviewData
ICLR2023-OpenReviewData fedebotu Jupyter Notebook

Crawl & Visualize ICLR 2023 Data from OpenReview

87
narr
narr IljaN Go

Download audio tracks from Netflix to sample your favorite shows

87
WebSecurityArticles
WebSecurityArticles zongdeiqianxing Python

爬取及整理Freebuf\安全客\先知\知道创宇等站点的”web安全“类优质文章

87
Bilibili_manga_download
Bilibili_manga_download Randark-JMT Python

带图形界面的哔哩哔哩漫画下载工具

86
scrapy_helper
scrapy_helper facert CSS

Dynamic configurable crawl (动态可配置化爬虫)

86
webspot
webspot crawlab-team Python

An intelligent web service to automatically detect web content and extract information from it.

86
extension
extension get-set-fetch TypeScript

web scraping extension

85
shopify-app-store-scraper
shopify-app-store-scraper usernam3 Python

Crawler behind the Shopify App Marketplace dataset

84
is-google
is-google roccomuso JavaScript

Verify that a request is from Google crawlers using Google's DNS verification steps

84
GMaps-Crawler
GMaps-Crawler guilatrova Python

Google Maps crawler using Selenium. All extracted data is forwarded to a SQS queue.

84
skweez
skweez edermi Go

Fast website scraper and wordlist generator

84
metacritic_api
metacritic_api melroy89 PHP

PHP Metacritic API - Mirror from my GitLab

83
Hands-on-WebScraping
Hands-on-WebScraping superryeti Python

This repo is a part of blog series on several web scraping projects where we will explore scraping techniques to crawl data from simple websites to we...

83
ParseHub
ParseHub z-mio Python

轻量、异步、开箱即用的社交媒体聚合解析库

83