triattention-ggml

triattention-ggml

domvox

Frequency-based KV cache pruning for llama.cpp — 25% cache reduction, improved PPL at long context. GPU compaction kernel for HIP/ROCm.

9 Stars
0 Forks
9 Watchers
Python Language
apache-2.0 License
80 SrcLog Score
Cost to Build
$52.9K
Market Value
$73.0K

Growth over time

3 data points  ·  2026-04-10 → 2026-04-25
Stars Forks Watchers
💬

How do you feel about this project?

Ask AI about triattention-ggml

Question copied to clipboard

What is the domvox/triattention-ggml GitHub project? Description: "Frequency-based KV cache pruning for llama.cpp — 25% cache reduction, improved PPL at long context. GPU compaction kernel for HIP/ROCm.". Written in Python. Explain what it does, its main use cases, key features, and who would benefit from using it.

Question is copied to clipboard — paste it after the AI opens.

How to clone triattention-ggml

Clone via HTTPS

git clone https://github.com/domvox/triattention-ggml.git

Clone via SSH

[email protected]:domvox/triattention-ggml.git

Download ZIP

Download master.zip

Found an issue?

Report bugs or request features on the triattention-ggml issue tracker:

Open GitHub Issues