[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement