阅读中文版本.
MOSS-Speech introduces true end-to-end speech interaction. Unlike cascaded pipelines or text-guided models, it directly generates speech without first producing text. Our design not only overcomes the limitation of generated speech being constrained by a text bottleneck, but also inherits the knowledge of the pretrained text language model, thereby enabling more natural and efficient speech-to-speech dialogue.
We add modality-based layer-splitting to a pretrained text LLM, and follow a frozen pre-training strategy to preserve the LLM's capabilities while extending it to speech modality.
Check out our video demo and live demo.
Technical report is available at arXiv:2510.00499.
- True Speech-to-Speech Modeling: No text guidance required.
- Layer-Splitting Architecture: Integrates modality-specific layers on top of pretrained text LLM backbones.
- Frozen Pre-Training Strategy: Preserves LLM abilities while extending to speech modality.
- State-of-the-Art Performance: Excels in spoken question answering and speech-to-speech tasks.
# Clone the repository
git clone https://github.com/OpenMOSS/MOSS-Speech
cd MOSS-Speech
# Install dependencies
pip install -r requirements.txt
git submodule update --init --recursive
Launch the web demo
python3 gradio_demo.py
- Open source base model: Release the MOSS-Speech-Base model for community use
- Support streaming output in Gradio: Implement streaming output for lower response latency in the web demo
- The code in this repository is released under the Apache 2.0 license.
- Qwen: We use Qwen3-8B as the base model.
- We thank an anonymous colleague for Character Voice!
If you use this repository or model in your research, please cite:
@misc{zhao2025mossspeechtruespeechtospeechmodels,
title={MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance},
author={Xingjian Zhao and Zhe Xu and Luozhijie Jin and Yang Wang and Hanfu Chen and Yaozhou Jiang and Ke Chen and Ruixiao Li and Mingshu Chen and Ruiming Wang and Wenbo Zhang and Yiyang Zhang and Donghua Yu and Yang Gao and Xiaogui Yang and Yitian Gong and Yuanfan Xu and Qinyuan Cheng and Zhaoye Fei and Shimin Li and Yaqian Zhou and Xuanjing Huang and Xipeng Qiu},
year={2025},
eprint={2510.00499},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.00499},
}