Replies: 35 comments 53 replies
-
In the new update, there is an 'experiment' button in the auto train settings. I'm still working on this part so please check setting before start train to make sure all ok for the dataset you use If you encounter memory issues, try using your own settings.
about mult-gpu check al ok about how offer save a checkpoints and last point , note: every checkpoint you need 5G disk space ! About the epoch, the default is now 10 epochs. You may need more or less see what working for you i try make more test and i make some good value soon |
Beta Was this translation helpful? Give feedback.
-
What does the Check Vocab step do practically? I'm sorry, but I'm not able to get it; what happens in this part if I want to fine-tune for french or spanish etc.? |
Beta Was this translation helpful? Give feedback.
-
what you think it's good idea to add info for the system and gpu new tab ? |
Beta Was this translation helpful? Give feedback.
-
Thanks for this great work. I've managed to fine-tine my first model, but I noob question is how do I test the model whether in cli or gui? |
Beta Was this translation helpful? Give feedback.
-
awesome work are you planning to improver and fix errors? |
Beta Was this translation helpful? Give feedback.
-
@lpscr |
Beta Was this translation helpful? Give feedback.
-
you should make it so it informs the user if ffmpeg is not available instead of silently failing on the transcribe, this was giving me problems on a runpod instance. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Can someone explain to me how to practice with languages other than English? |
Beta Was this translation helpful? Give feedback.
-
How many dataset you usually use (how many hours or minutes) to finetune and produce great result?, and I tried it in collab using t4 and needed to lower the batch size per gpu too i guess. Anyway thanks for this easy to use pathway for finetuning F5. |
Beta Was this translation helpful? Give feedback.
-
Hello @lpscr, Thank you for sharing this project! I have been following your instructions:
After placing WAV files in the newly created project folder and checking user, I clicked the Transcribe button and waited for the process to complete. However, the info field displays “You need to load an audio file.” Could you please help me resolve this issue? Thank you! |
Beta Was this translation helpful? Give feedback.
-
usage: finetune_cli.py [-h] [--exp_name {F5TTS_Base,E2TTS_Base}] [--dataset_name DATASET_NAME] Hello @lpscr, It appears that the update for the parameter --file_checkpoint_train has not yet been committed and merged into finetune_cli.py, although it has been merged in finetune_gradio. By the way, is this parameter intended to allow resuming fine-tuning from a previous checkpoint? Could you please provide an example of the exact path to the checkpoint? I assume it would be something like ckpts/project_name/checkpoint.pt, correct? |
Beta Was this translation helpful? Give feedback.
-
To create a public link, set |
Beta Was this translation helpful? Give feedback.
-
@osmania101 Can you check if you have the latest update repo ? . Did you follow the correct installation steps at the beginning to install
|
Beta Was this translation helpful? Give feedback.
-
How can I prepare multi speaker dataset? if i have a multi speaker dataset i should skip speaker id and only keep text and audio pair? |
Beta Was this translation helpful? Give feedback.
-
Hi @lpscr , it's a fantastic job! Thanks a lot. |
Beta Was this translation helpful? Give feedback.
-
Hey @lpscr, the tracing works fantastic but I've quite a tracing process. How can I resume it? |
Beta Was this translation helpful? Give feedback.
-
Hi when i tired to fine tune i got the following error
when i tried removing the arg 'e', then the training is stuck for so long with no logs |
Beta Was this translation helpful? Give feedback.
-
I encountered a problem that after successfully traning the model and pressing "test model", the output sound was empty, silent, no sound at all. I tried again many times. . please help me |
Beta Was this translation helpful? Give feedback.
-
Hi @lpscr , I've created a script and successfully built a metadata.csv and a folder with my wav files, I used the 9 hours split so I'm playing actually with just 9 hours of multispeaker Italian language, is it a fair amount of hours for finetuning? basicallly I have I tried 10 epoch but the result isn't good. I don't know EMA here how works, but wouldn't make sense to add an option in "Train Data" page to specify how to generate the sample? This way I can have a better idea of the overall result comparing the samples without using EMA (actually I think it uses EMA by default?) Feel free to give me any tips to improve results and thanks for the UI :) P.S. |
Beta Was this translation helpful? Give feedback.
-
so i need 10 hours of speech for new language ? |
Beta Was this translation helpful? Give feedback.
-
Sorry, I don't remember, I only tested the program but tested several others at the same time.
Från: Robert Agee ***@***.***>
Skickat: den 25 december 2024 22:53
Till: SWivid/F5-TTS ***@***.***>
Kopia: CJ468 ***@***.***>; Manual ***@***.***>
Ämne: Re: [SWivid/F5-TTS] Gradio interface for fine-tuning or train to make it user-friendly and accessible for beginners, as well as to help the community (Discussion #143)
Did you figure this out? Running into the same problem
—
Reply to this email directly, view it on GitHub <#143 (reply in thread)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHQLEJSRAP24FTPT6WGOWUD2HMSKFAVCNFSM6AAAAABQDSQRPWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNRWGUZDGMI> .
You are receiving this because you are subscribed to this thread. <https://github.com/notifications/beacon/AHQLEJSB4Z6TMLHRUMNQSQ32HMSKFA5CNFSM6AAAAABQDSQRPWWGG33NNVSW45C7OR4XAZNRIRUXGY3VONZWS33OINXW23LFNZ2KUY3PNVWWK3TUL5UWJTQAWH7U6.gif> Message ID: ***@***.*** ***@***.***> >
|
Beta Was this translation helpful? Give feedback.
-
Hi! When I want to test the model already retrained and finetuned with new audio samples, it gives me this error: Can you help me please? |
Beta Was this translation helpful? Give feedback.
-
First of all, thank you for your amazing work. I’ve been trying to train a Korean TTS model using F5-TTS (multiple times). Initially, I attempted to train it with a multi-speaker dataset. While there was almost no quality difference between the reference and generated audio during training, the actual inference only managed to copy the voice timbre and failed to read Korean text correctly. Later, I came across your advice in this discussion and tried training again with a single-speaker dataset of about 12 hours. The training settings were as follows: Batch size per GPU: 1600 For reference, my GPU is an NVIDIA GeForce RTX 3080 with 10GB of VRAM. ( yeah, it was hard work ^_T ) Below is the training log. Although the training didn’t complete all the set epochs due to my computer overheating, it reached approximately 300K steps. The inference results from the trained model are as follows:
In conclusion, I have a few questions for you:
Thank you! |
Beta Was this translation helpful? Give feedback.
-
I've never figure it how to train it. Even i tried multiple time the exact order like in the video and other videos. All the time it crashes. So i can't train a new language or any language. |
Beta Was this translation helpful? Give feedback.
-
Thanks for good work. Everyting works almost fine here, I'm training polish models with success. However, I have a problem with worker processess (maybe a stupid one) - I cannot reduce it from default 16 to 12 (for i7-8700) or 8 (for i7-9700), something is overriding num_workers settings in train.py. This causes problems with bigger datasets. How can I reduce worker number? |
Beta Was this translation helpful? Give feedback.
-
thanks for this amazing work!!! I followed the same steps as the video but when started training, I got the following messages. Is there anything that I did wrong? MessagesRuntimeError: Error(s) in loading state_dict for CFM: |
Beta Was this translation helpful? Give feedback.
-
Please Help I keep getting this error. I used a couple of audio samples, used the "Transcribe Data" tab, then the "Prepare Data" tab. When I get to the train tab i get this error. run command : The following values were not passed to Word segmentation module jieba initialized. Loading model cost 0.520 seconds. vocab : 2545 vocoder : vocos vocab : 2545 vocoder : vocos |
Beta Was this translation helpful? Give feedback.
-
Has anyone able to train this on Multiple-4090 GPU's setup (2 or more)?? I am getting this - #728 (comment) |
Beta Was this translation helpful? Give feedback.
-
Hi When I set the path in Tokenizer File and Path to the Pretrained Checkpoint I get this error: Loading model cost 0.440 seconds. vocab : 2580 vocoder : vocos During handling of the above exception, another exception occurred: Traceback (most recent call last): Traceback (most recent call last): |
Beta Was this translation helpful? Give feedback.
-
Hello everyone!
I have created a Gradio application for easy fine-tuning and training of models. You can find it here:
https://github.com/lpscr/F5-TTS
EDIT : this merge in main repo
NEW
: with new version all now automatic you can simple easy finetune anylangauge
with just simple clickHere's a new complete video step by step, there is sound!:
Please make sure the video is not muted click the speaker icon in video ! . enjoy ;)
amazing.tutorial.mp4
BTW :
The man voice in the video was created using
f5tts
!you can get from here https://github.com/SWivid/F5-TTS/tree/main/src/f5_tts/infer/examples/basic
note for the new language
Before you start, if you are going to fine-tune a new language, you will need a substantial amount of dataset hours! As I see here, you can fine-tune a single voice with just 10 to 15 hours. but for multiple speakers, you’ll need more—about 50 hours to start. If you want a good model, aim for at least 100 hours; for something perfect, aim for at least 300 hours or more. See what works for you; it might also be possible to achieve good results with fewer hours in your case.
also mentioned success with just 10-15 hours of fine-tuning for one or two voices. follow langauge
Spanish
,Indian (Malayalam) with extend tokens
,Hungarian
note for the English or Chinese
Regarding English or Chinese, if you want to fine-tune a speaker, first check if it's already working because the model is good enough,
and you may not need to fine-tune the speaker. You can test with 2 to 5 hours or more and see what works.
Please share any experiments or results about what works and what doesn’t, so that others can know as well
quick start
first create new project then see what you need
1 . Transcribe Data Option: Skip this if you already have a
metadata.csv
andwavs
folder.You can simply click the audio button to open Explorer and select one or multiple audio files.
If you check the
audio from path
, you need to place all audio files indata/my_speak/dataset
you can click button random sample to see text and audio
2 . Vocab Check Option: Use this only when you want to train a new language.
If you need to extend the vocab, you can easily click "Check Vocab"
to see all missing symbols or write your symbols like
a,b,c,d
etc.If you click "Extend," this creates a new
model_1200000.pt
andvocab.txt
file:3 . Prepare Data Option: Skip this if you already have
raw.arrow
,duration.json
, andvocab.txt
. You can click the random sample button to see token and audio.If you have the files
raw.arrow
,duration.json
, andvocab.txt
, make sure they are in the correct path:in case you skip the Transcribe , place your dataset (
wavs
folder andmetadata.csv
file):Supported audio formats:
"wav", "mp3", "aac", "flac", "m4a", "alac", "ogg", "aiff", "wma", "amr"
.format how look like in
metadata.csv
line 1.
audio1|text1
oraudio1.wav|text1
oryour_path/audio1.wav|text1
line 2.
audio2|text2
oraudio2.mp3|text2
oryour_path/audio2.mp3|text2
...
Click "Prepare" to create
raw.arrow
,duration.json
, andvocab.txt.
4 . Train Data:
auto setting button this give you best results but you need check if all ok
If you encounter memory issues, try using your own settings.
barch size per gpu
to lower numberabout how offer save a checkpoints and last point ,
save per update
sey something working for youlast per step
# use smaller value for to save model_last.pt more offer like this when crash train or stop you can easy continue where you leftnote: every checkpoint you need 5G disk space !
About the
epoch
, the default is now 10 epochs. You may need more or less see what working for youWhen the model trains, you get sample audio every few steps to see how well the model is doing. Click the refresh button or check in the path
ckpts/my_speak/sample
folder.5 . Test Model: Testing your model is simple and easy. Check use_ema to be True or False to see what works best for you.
when you run the train the test model working in
cpu
mode ! you need stop the train to run ingpu
Click the 'Random Sample' button to view get a text and audio. for dataset
You can compare reference (ref) and generated (gen) audio, enter text in 'gen_text,' or load a new reference in 'ref_text.'
To load your audio reference, click the 'X' button. If the ref text is empty, it will automatically transcribe.
6 . Reduce Model Size: You can reduce the model size from 5GB to 1GB.
you find check point to
you see all automatic now ;)
Beta Was this translation helpful? Give feedback.
All reactions