1 repository on SrcLog
TurboQuant Vulkan: 3-bit KV cache quantization for llama.cpp using Lloyd-Max Gaussian codebooks. 4.57x compression, Vulkan GPU support (AMD/Intel/NVIDIA). Hobby project.