1 repository on SrcLog
Frequency-based KV cache pruning for llama.cpp — 25% cache reduction, improved PPL at long context. GPU compaction kernel for HIP/ROCm.