An open-source tool-augmented conversational language model from Fudan University
MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, it supports streaming and variable bitrates, delivering SOTA reconstruction and strong performance in generation and understanding—serving as a unified interface for next-generation native audio language models.
a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation