triattention-ggml

Frequency-based KV cache pruning for llama.cpp — 25% cache reduction, improved PPL at long context. GPU compaction kernel for HIP/ROCm.

amd

View on GitHub

9 Stars

0 Forks

9 Watchers

Python Language

apache-2.0 License

80 SrcLog Score

Cost to Build

$52.9K

Market Value

$73.0K

How is this calculated?

Growth over time

3 data points · 2026-04-10 → 2026-04-25

Stars Forks Watchers

💬

How do you feel about this project?

Ask AI about triattention-ggml

Question copied to clipboard

What is the domvox/triattention-ggml GitHub project? Description: "Frequency-based KV cache pruning for llama.cpp — 25% cache reduction, improved PPL at long context. GPU compaction kernel for HIP/ROCm.". Written in Python. Explain what it does, its main use cases, key features, and who would benefit from using it.

Question is copied to clipboard — paste it after the AI opens.

How to clone triattention-ggml

Clone via HTTPS

git clone https://github.com/domvox/triattention-ggml.git

Clone via SSH

[email protected]:domvox/triattention-ggml.git

Download ZIP

Download master.zip

Found an issue?

Report bugs or request features on the triattention-ggml issue tracker:

Open GitHub Issues

Similar to triattention-ggml

webpack madge steal wufuc conditioner awesome-vulkan esl zydis staxrip pyopencl coriander lodjs easytimer.js ocl angular-async-loader node-dependency-tree purge-wrangler require-vuejs GatelessGateSharp SimpleSvm set-egpu parenchyma node-precinct TypeScript-AMD-Boilerplate ScanTree awesome-d3d12 Tensile generator-module-boilerplate akase romania-choropleth