Trained TTS Vitsmodel Restricted to 11 Seconds of Audio Generation #6972

shuvohishab · 2023-07-04T18:49:28Z

shuvohishab
Jul 4, 2023

By following this tutorial, I trained a TTS VitsModel.
While performing inference of the trained TTS model to convert text into speech, I observed a limitation where the system generates only up to 11 seconds of audio. So, when dealing with longer texts (e.g., equivalent to 90 seconds of audio), it just produces only first 11 seconds of equivalent audio.
On the other hand, when the text is short and equivalent to 0 to 11 seconds of audio, the system generates audio of the exact length.

Is there a workaround available to generate the complete audio for longer texts?

Note:

I'm using the latest codebase to implement the TTS model.
The audio data I am working with varies in length from 0.1 seconds to 63 seconds.

Answered by treacker

Aug 1, 2023

Sorry for late responce. The problem is in max_len parameter, which is set by default to 1000 and can't be changed from convert_text_to_waveform function. The workaround is either to use forward function which is called inside convert_text_to_waveform or add option to change max_len. @XuesongYang choose please what is preferred solution

View full answer

shuvohishab · 2023-07-25T04:18:26Z

shuvohishab
Jul 25, 2023
Author

@VahidooX could you kindly review my inquiry? Thank you.

0 replies

titu1994 · 2023-07-25T05:27:59Z

titu1994
Jul 25, 2023
Maintainer

@XuesongYang

0 replies

XuesongYang · 2023-07-25T17:49:08Z

XuesongYang
Jul 25, 2023
Collaborator

thanks. @treacker. Could you pls have a look?

0 replies

XuesongYang · 2023-07-25T17:51:01Z

XuesongYang
Jul 25, 2023
Collaborator

found the same reported issue as well: #6998

0 replies

treacker · 2023-08-01T19:53:37Z

treacker
Aug 1, 2023

Sorry for late responce. The problem is in max_len parameter, which is set by default to 1000 and can't be changed from convert_text_to_waveform function. The workaround is either to use forward function which is called inside convert_text_to_waveform or add option to change max_len. @XuesongYang choose please what is preferred solution

19 replies

PhilipAmadasun Mar 7, 2024

@shuvohishab Doesn't that use cpu? And also where is the audio gonna get stored? I want a .wav file out of this.

shuvohishab Mar 7, 2024
Author

Change cpu to cuda if you want to infer in GPU.
At first generate the audio, then you can save the audio with audio_pred .

PhilipAmadasun Mar 7, 2024

@shuvohishab

audio_pred = model.convert_text_to_waveform(tokens=tokens).cuda().detach().numpy()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

shuvohishab Mar 7, 2024
Author

Kindly follow the instruction provided in error log.

PhilipAmadasun Mar 7, 2024

@shuvohishab Never mind, forget my last message. I did have another question about the NeMo installation. When I changed the max lengths, did I need to reinstall nemo all over again? I pip installed the previous time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trained TTS Vitsmodel Restricted to 11 Seconds of Audio Generation #6972

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 19 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Trained TTS Vitsmodel Restricted to 11 Seconds of Audio Generation #6972

shuvohishab Jul 4, 2023

Replies: 5 comments · 19 replies

shuvohishab Jul 25, 2023 Author

titu1994 Jul 25, 2023 Maintainer

XuesongYang Jul 25, 2023 Collaborator

XuesongYang Jul 25, 2023 Collaborator

treacker Aug 1, 2023

PhilipAmadasun Mar 7, 2024

shuvohishab Mar 7, 2024 Author

PhilipAmadasun Mar 7, 2024

shuvohishab Mar 7, 2024 Author

PhilipAmadasun Mar 7, 2024

shuvohishab
Jul 4, 2023

Replies: 5 comments 19 replies

shuvohishab
Jul 25, 2023
Author

titu1994
Jul 25, 2023
Maintainer

XuesongYang
Jul 25, 2023
Collaborator

XuesongYang
Jul 25, 2023
Collaborator

treacker
Aug 1, 2023

shuvohishab Mar 7, 2024
Author

shuvohishab Mar 7, 2024
Author