[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
What is the SqueezeAILab/KVQuant GitHub project? Description: "[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization". Written in Python. Explain what it does, its main use cases, key features, and who would benefit from using it.
Question is copied to clipboard — paste it after the AI opens.
Clone via HTTPS
Clone via SSH
Download ZIP
Download master.zipReport bugs or request features on the KVQuant issue tracker:
Open GitHub Issues