Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add step1 audio tts #121

Merged
merged 12 commits into from
Feb 21, 2025
Merged

feat: add step1 audio tts #121

merged 12 commits into from
Feb 21, 2025

Conversation

weedge
Copy link
Collaborator

@weedge weedge commented Feb 20, 2025

feat:

  • add step speech lm and streamer
  • add default step tts speakers asset
  • add add tts mode: lm_gen, voice_clone: voice clone w/o lm gen, decode wav code
    support tts mode:
    • lm_gen: text+ref audio waveform lm gen audio wav code to gen waveform with static batch stream:
      text+ref audio waveform -> tokenizer -> text+audio token ids -> step1 lm -> audio token ids (wav_code) -> flow(CFM) -> mel - vocoder(HiFT) -> waveform
    • voice_clone: voice clone w/o lm gen, decode wav code:
      src+ref audio waveform -> speech tokenizer-> audio token ids (wav_code) -> flow(CFM) -> mel - vocoder(HiFT) -> clone ref audio waveform
  • add step tts test
python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_get_voices
REF_AUDIO_PATH=./test/audio_files/asr_example_zh.wav \
    REF_TEXT="欢迎大家来体验达摩院推出的语音识别模型" \
    python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_set_voice

python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_synthesize
python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_synthesize_speak

# ref audio 
TTS_STREAM_FACTOR=4 \
    REF_AUDIO_PATH=./test/audio_files/asr_example_zh.wav \
    REF_TEXT="欢迎大家来体验达摩院推出的语音识别模型" \
    TTS_TEXT="万物之始,大道至简,衍化至繁。君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。" \
    python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_synthesize

TTS_STREAM_FACTOR=4 \
REF_AUDIO_PATH=./test/audio_files/asr_example_zh.wav \
    REF_TEXT="欢迎大家来体验达摩院推出的语音识别模型" \
    TTS_TEXT="万物之始,大道至简,衍化至繁。君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。" \
    python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_synthesize_speak

# ---- TTS_MODE: voice_clone ----
# src audio + default ref audio
SRC_AUDIO_PATH=./test/audio_files/asr_example_zh.wav \
    python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_synthesize
  • add step tts grpc client
# grpc serve
python -m src.cmd.grpc.speaker.server.serve

# tts lm gen
TTS_TAG=tts_step IS_SAVE=1 IS_RELOAD=1 \
    TTS_WARMUP_STEPS=2 TTS_LM_MODEL_PATH=./models/stepfun-ai/Step-Audio-TTS-3B \
    TTS_TOKENIZER_MODEL_PATH=./models/stepfun-ai/Step-Audio-Tokenizer \
    python -m src.cmd.grpc.speaker.client
# tts voice clone
TTS_TAG=tts_step IS_SAVE=1 IS_RELOAD=1 \
    TTS_WARMUP_STEPS=2 TTS_LM_MODEL_PATH=/content/models/stepfun-ai/Step-Audio-TTS-3B \
    TTS_TOKENIZER_MODEL_PATH=/content/models/stepfun-ai/Step-Audio-Tokenizer \
    TTS_STREAM_FACTOR=2 \
    TTS_MODE=voice_clone \
    SRC_AUDIO_PATH=./test/audio_files/asr_example_zh.wav \
    python -m src.cmd.grpc.speaker.client

colab 笔记:


step-audio TTS from step-audio (Speech Decoder)

image

image

image

image

step1 LM 3B + flow (code from CosyVoice)+ HiFT(code from CosyVoice)

speech tokenizer

image

a dual codebook speech tokenizer framework. like ARCON (from stepfun team);

linguistic tokenizer use FunASR Paraformer(NAR) model;

semantic tokenizer use CosyVoice speech tokenizer(from SenseVoice)

step1 LM 3B from step-audio 130B distillation

flow (CFM)

see:

HiFT vocoder

see:

Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
…de wav code

Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant