Topic

crawler

Repositories (1431)

scrapedin
scrapedin linkedtales JavaScript

LinkedIn Scraper (currently working 2020)

611
fredy
fredy orangecoding HTML

❤️ Fredy - [F]ind [R]eal [E]state [D]amn Eas[y] - Fredy keeps searching for new apartments, houses, and flats in Germany on platforms like ImmoScout24...

610
Moodle-DL
Moodle-DL C0D3D3V Python

Moodle-DL downloads course content fast from Moodle (eg. lecture pdfs)

601
newcrawler
newcrawler speed JavaScript

Free Web Scraping Tool with Java

587
pryingdeep
pryingdeep iudicium Go

Prying Deep - An OSINT tool to collect intelligence on the dark web.

582
python-automation-scripts
python-automation-scripts avidLearnerInProgress Python

Simple yet powerful automation stuffs.

564
webster
webster zhuyingda JavaScript

a reliable high-level web crawling & scraping framework for Node.js.

561
mmjpg
mmjpg chenjiandongx Python

👩 美女写真套图爬虫(一)

556
webclaw
webclaw 0xMassi Rust

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

551
vault
vault abhisharma404 Python

swiss army knife for hackers

549
crawljax
crawljax crawljax Java

Crawljax

540
freshonions-torscraper
freshonions-torscraper dirtyfilthy Python

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

535
Python3Webcrawler
Python3Webcrawler mochazi Python

🌈Python3网络爬虫实战:QQ音乐歌曲、京东商品信息、房天下、破解有道翻译、构建代理池、豆瓣读书、百度图片、破解网易登录、B站模拟扫码登录、小鹅通、荔枝微课

531
nintendo-switch-eshop
nintendo-switch-eshop lmmfranco TypeScript

Crawler for Nintendo Switch eShop

523
Krawl
Krawl BlessedRebuS Python

Krawl is a customizable, lightweight, cloud-native web deception server and anti-crawler that creates fake web applications with low-hanging vulnerabi...

521
opensearchserver
opensearchserver jaeksoft Java

Open-source Enterprise Grade Search Engine Software

515
Scan-T
Scan-T nanshihui C

a new crawler based on python with more function including Network fingerprint search

504
TorCrawl.py
TorCrawl.py MikeMeliz Python

Crawl and extract (regular or onion) webpages through TOR network

504
scrapple
scrapple AlexMathew Python

A framework for creating semi-automatic web content extractors

502
reader
reader vakra-dev TypeScript

Open-source, production-grade web scraping engine built for LLMs. Scrape and crawl the entire web, clean markdown, ready for your agents.

499
Html2Article
Html2Article stanzhai C#

Html网页正文提取

496
Fast-Powerful-Whisper-AI-Services-API
Fast-Powerful-Whisper-AI-Services-API Evil0ctal Python

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计...

464
ICLR2020-OpenReviewData
ICLR2020-OpenReviewData shaohua0116 Jupyter Notebook

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

460
tsrtc
tsrtc Asoul JavaScript

台灣股票即時爬蟲。Taiwan Stock Exchange Real Time Crawler

458
sitemap-generator
sitemap-generator lgraubner JavaScript

Easily create XML sitemaps for your website.

452
fundus
fundus flairNLP Python

A very simple news crawler with a funny name

450
AllNewsSpider
AllNewsSpider Python3Spiders Python

澎湃新闻,新浪新闻,腾讯新闻,搜狐新闻,新闻联播,泰晤士报,纽约时报,BBCNews,旨在爬取所有新闻门户网站的新闻,禁止将所得数据商用!

449
Youtube-Projects
Youtube-Projects ayushi7rawat Python

This repository contains all the code I use in my YouTube tutorials.

432
Pinkerton
Pinkerton 0xdsm Python

🕵️ Python project to crawl for JavaScript files and search for secrets like API keys, authorization tokens, hardcoded credentials, etc.

431
media-scraper
media-scraper elvisyjlin Python

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok

428
signature_algorithm
signature_algorithm gadfly0x Python

各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)

427
coom-dl
coom-dl notFaad Dart

Coomer| kemono .party or su downloader

427
dude
dude roniemartinez Python

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

425
EmailFinder
EmailFinder Josue87 Python

Search emails from a domain through search engines

418
gospider
gospider nange Go

golang实现的爬虫框架,使用者只需关心页面规则,提供web管理界面。基于colly开发。

417
music-recover
music-recover heqin-zhu Python

:musical_note: 缓存文件转换为 MP3 文件

409
sosse
sosse biolds Python

Selenium Open Source Search Engine & crawler

409
CrawlerForReader
CrawlerForReader smuyyh Java

Android 本地网络小说爬虫,基于jsoup及xpath

406
magic_google
magic_google howie6879 Python

Google search results crawler, get google search results that you need

404
jivesearch
jivesearch jivesearch JavaScript

A search engine that doesn't track you.

402
second-order
second-order mhmdiaa Go

Second-order subdomain takeover scanner

402
ghcrawler
ghcrawler microsoft JavaScript

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...

390
scraperai
scraperai scraperai HTML

ScraperAI is an open-source, AI-powered tool designed to simplify web scraping for users of all skill levels.

390
ICLR2019-OpenReviewData
ICLR2019-OpenReviewData shaohua0116 Jupyter Notebook

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

387
InstagramCrawler
InstagramCrawler tzuhsial Python

A non API python program to crawl public photos, posts or followers

384
supercrawler
supercrawler brendonboshell JavaScript

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limi...

382
crawler-js-hook-framework-public
crawler-js-hook-framework-public JSREI

JS逆向Hook工具集,开源部分工具到这里

382
webpalm
webpalm XORbit01 Go

🕸️ Crawl in the web network

381
TTBot
TTBot 01ly Python

今日头条机器人,支持用户登陆、关注、取消关注、获取关注粉丝、发文、发悟空问答、点赞、评论、采集各种类型新闻讯息等,使用今日头条网页版API实现

377
freeproxy
freeproxy CharlesPikachu Python

FreeProxy: Collecting free proxies from internet. (全球海量高质量免费代理,支持爬取数十个免费代理分享源,支持自定义规则代理筛选,爬虫与数据分析必备,...

376