Gradio interface for fine-tuning or train to make it user-friendly and accessible for beginners, as well as to help the community #143

lpscr · 2024-10-17T12:01:16Z

lpscr
Oct 17, 2024

Hello everyone!

I have created a Gradio application for easy fine-tuning and training of models. You can find it here:
https://github.com/lpscr/F5-TTS

EDIT : this merge in main repo

NEW : with new version all now automatic you can simple easy finetune any langauge with just simple click

Here's a new complete video step by step, there is sound!:
Please make sure the video is not muted click the speaker icon in video ! . enjoy ;)

amazing.tutorial.mp4

BTW :
The man voice in the video was created using f5tts !
you can get from here https://github.com/SWivid/F5-TTS/tree/main/src/f5_tts/infer/examples/basic

note for the new language

Before you start, if you are going to fine-tune a new language, you will need a substantial amount of dataset hours! As I see here, you can fine-tune a single voice with just 10 to 15 hours. but for multiple speakers, you’ll need more—about 50 hours to start. If you want a good model, aim for at least 100 hours; for something perfect, aim for at least 300 hours or more. See what works for you; it might also be possible to achieve good results with fewer hours in your case.

also mentioned success with just 10-15 hours of fine-tuning for one or two voices. follow langauge
Spanish , Indian (Malayalam) with extend tokens , Hungarian

note for the English or Chinese

Regarding English or Chinese, if you want to fine-tune a speaker, first check if it's already working because the model is good enough,
and you may not need to fine-tune the speaker. You can test with 2 to 5 hours or more and see what works.

Please share any experiments or results about what works and what doesn’t, so that others can know as well

quick start

first create new project then see what you need

1 . Transcribe Data Option: Skip this if you already have a metadata.csv and wavs folder.
You can simply click the audio button to open Explorer and select one or multiple audio files.
If you check the audio from path, you need to place all audio files in data/my_speak/dataset

my_speak
│
├── dataset/
│   ├── audio1.wav
│   └── audio2.wav

you can click button random sample to see text and audio

2 . Vocab Check Option: Use this only when you want to train a new language.
If you need to extend the vocab, you can easily click "Check Vocab"
to see all missing symbols or write your symbols like a,b,c,d etc.
If you click "Extend," this creates a new model_1200000.pt and vocab.txt file:

├── data/
│└──my_speak/
│       └── vocab.txt
│
├── ckpts/
    └── my_speak/
        ├── model_1200000.pt/

3 . Prepare Data Option: Skip this if you already have raw.arrow, duration.json, and vocab.txt. You can click the random sample button to see token and audio.
If you have the files raw.arrow, duration.json, and vocab.txt, make sure they are in the correct path:

my_speak
│
├── raw.arrow
├── duration.json
└── vocab.txt

in case you skip the Transcribe , place your dataset (wavs folder and metadata.csv file):

my_speak
│
├── wavs/
│   ├── audio1.wav
│   └── audio2.wav
│
└── metadata.csv

Supported audio formats: "wav", "mp3", "aac", "flac", "m4a", "alac", "ogg", "aiff", "wma", "amr".

format how look like in metadata.csv

Click "Prepare" to create raw.arrow, duration.json, and vocab.txt.

4 . Train Data:
auto setting button this give you best results but you need check if all ok

If you encounter memory issues, try using your own settings.
barch size per gpu to lower number

about how offer save a checkpoints and last point ,
save per updatesey something working for you
last per step # use smaller value for to save model_last.pt more offer like this when crash train or stop you can easy continue where you left

note: every checkpoint you need 5G disk space !

About the epoch , the default is now 10 epochs. You may need more or less see what working for you

When the model trains, you get sample audio every few steps to see how well the model is doing. Click the refresh button or check in the path ckpts/my_speak/sample folder.

5 . Test Model: Testing your model is simple and easy. Check use_ema to be True or False to see what works best for you.
when you run the train the test model working in cpu mode ! you need stop the train to run in gpu

Click the 'Random Sample' button to view get a text and audio. for dataset
You can compare reference (ref) and generated (gen) audio, enter text in 'gen_text,' or load a new reference in 'ref_text.'
To load your audio reference, click the 'X' button. If the ref text is empty, it will automatically transcribe.

6 . Reduce Model Size: You can reduce the model size from 5GB to 1GB.

you find check point to

├── ckpts/
    └── my_speak/
        ├── model_10000.pt/
        ├── model_20000.pt/
        ├── model_1200000.pt/
        ...

you see all automatic now ;)

lpscr · 2024-10-18T08:57:55Z

lpscr
Oct 18, 2024
Author

In the new update, there is an 'experiment' button in the auto train settings. I'm still working on this part

so please check setting before start train to make sure all ok for the dataset you use

If you encounter memory issues, try using your own settings.

barch size per gpu to lower number

about mult-gpu check al ok
barch size per gpu see what working best for you

about how offer save a checkpoints and last point ,
save per update
last per step # use smaller value for to save model_last.pt more offer like this when crash train or stop you can easy continue where you left

note: every checkpoint you need 5G disk space !

About the epoch, the default is now 10 epochs. You may need more or less see what working for you
epoch

i try make more test and i make some good value soon

1 reply

SyamsQ Dec 11, 2024

Please support Indonesian

leoiania · 2024-10-18T12:08:49Z

leoiania
Oct 18, 2024

What does the Check Vocab step do practically? I'm sorry, but I'm not able to get it; what happens in this part if I want to fine-tune for french or spanish etc.?

3 replies

lpscr Oct 18, 2024
Author

This checks all the symbols in the pre-trained model. If any are missing, it will give you a list of the missing symbols. If there are no errors, you can fine-tune it without any issues. Just make sure the project data/project_name/metadata.csv file is in place.

Edit :

my_speak
│
├── wavs/
│   ├── audio1.wav
│   └── audio2.wav
│
└── metadata.csv

Supported audio formats: "wav", "mp3", "aac", "flac", "m4a", "alac", "ogg", "aiff", "wma", "amr".

format how look like in metadata.csv

i hope this help

also check here about finetune #57

lpscr Oct 18, 2024
Author

@leoiania for spanish you can train with not problem all symbols wokring i make a quick train test and look like start lear

#57 (comment)

leoiania Oct 21, 2024

thank you! for this info and for your work on the gradio interface too!

lpscr · 2024-10-18T20:28:15Z

lpscr
Oct 18, 2024
Author

what you think it's good idea to add info for the system and gpu new tab ?

0 replies

rikabi89 · 2024-10-19T01:02:35Z

rikabi89
Oct 19, 2024

Thanks for this great work. I've managed to fine-tine my first model, but I noob question is how do I test the model whether in cli or gui?

1 reply

lpscr Oct 19, 2024
Author

in inference-cli.py to change the model

EDIT: make sure you have last update in rp i just make update on this to easy load model
arg --ckpt_file and file pt in your model
change the data/my_speak/model_80000.pt with your model file

python inference-cli.py \
--model "F5-TTS" \
--ref_audio "tests/ref_audio/test_en_1_ref_short.wav" \
--ref_text "Some call me nature, others call me mother nature." \
--gen_text "I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring. Respect me and I'll nurture you; ignore me and you shall face the consequences." \
--ckpt_file data/my_speak/model_80000.pt \

EDIT: i create api so now you can simple use make sure you update the main repo

from api import F5TTS

f5tts=F5TTS(ckpt_file=r"/home/F5-TTS/ckpts/my_speak/model_1200000.pt") 
# if you extend vocab make sure you use also the vocab_file

wav,sr,spect=f5tts.infer(
    ref_file="tests/ref_audio/test_en_1_ref_short.wav",
    ref_text="Some call me nature, others call me mother nature.",
    gen_text="""I don't really care what you call me.""",
    file_wave="/home/F5-TTS/test.wav"
)

make sure you are update the main repo

FurkanGozukara · 2024-10-19T10:20:41Z

FurkanGozukara
Oct 19, 2024

awesome work are you planning to improver and fix errors?

6 replies

FurkanGozukara Oct 19, 2024

@lpscr i haven't tested yet actually today learnt but before investing time I wanted to congratulate and ask :D

lpscr Oct 19, 2024
Author

You must try this! I'm sure you’ll find this amazing repo for high-quality TTS very cool.

FurkanGozukara Oct 19, 2024

@lpscr awesome. last time fine tuning i tried was coqui TTS. so this is better than that you think?

lpscr Oct 19, 2024
Author

For me, it's the best I've ever seen and my favorite. That's why I also made gradio finetune to make easier to use for beginner users. i have try a lot repo before tts like Coqui TTS, but this repository provides the highest quality TTS I’ve encountered. Try it yourself to see and compare!”

Dazzastrous Oct 23, 2024

Yeh Doc this is the best

sofianhw · 2024-10-20T17:00:57Z

sofianhw
Oct 20, 2024

@lpscr
Do we need training with Wenetspeech4TTS or using available checkpoints.
then fine-tune to other language.
Or just go fine-tune from the first place?

1 reply

lpscr Oct 21, 2024
Author

you can use preetrain model to use less data , the only need to have
you metadata.csv and wavs folder files

if you use Transcription make all for you automatic only need load audio file

or

you can place the dataset like this

Edit :

my_speak
│
├── wavs/
│   ├── audio1.wav
│   └── audio2.wav
│
└── metadata.csv

Supported audio formats: "wav", "mp3", "aac", "flac", "m4a", "alac", "ogg", "aiff", "wma", "amr".

format how look like in metadata.csv

jazza420 · 2024-10-21T11:10:28Z

jazza420
Oct 21, 2024

you should make it so it informs the user if ffmpeg is not available instead of silently failing on the transcribe, this was giving me problems on a runpod instance.
Also Great work!

1 reply

lpscr Oct 21, 2024
Author

@jazza420 hi thank you for let me know

in windows download download

https://www.gyan.dev/ffmpeg/builds/
extract files then go inside to bin folder you find 3 exe files

create folder in c: with name ffmpeg
put in C:\ffmpeg the exe files ffmpeg.exe , ffplay.exe , ffprobe.exe

run cmd as admin
and run this command to set path after must wroking fine
setx /M PATH "%PATH%;C:\ffmpeg

restart and must working fine

i hope this help

for linux use simple install
sudo apt install ffmpeg

lpscr · 2024-10-21T16:34:55Z

lpscr
Oct 21, 2024
Author

hi new update

transcribe you can now click button random sample to see how look like the transcribe

prepare you can now click button random sample to see how look like the prepare

test model
you can simple change model
if you click random sample this get audio for the data your train and text also you can change the text like want

1 reply

GSaucedoA Nov 23, 2024

How can I contine training a model? I have the metadata.csv, wavs folder and last_model.pt from spanish model, but I´m not getting good results with my voice, I want to continue training the spanish model to me more accurate with my voice

Bjorjank · 2024-10-22T07:03:59Z

Bjorjank
Oct 22, 2024

Can someone explain to me how to practice with languages other than English?

1 reply

lpscr Oct 22, 2024
Author

@Bjorjank i still test , to see what best for other langauge , if i success find something good i add in gradio finetune i working on
but i dont test it a lot , i still exparament method and setting stuff,
you can see more here other user try finetune

#57 (comment)

bastiansurya77 · 2024-10-22T23:32:36Z

bastiansurya77
Oct 22, 2024

How many dataset you usually use (how many hours or minutes) to finetune and produce great result?, and I tried it in collab using t4 and needed to lower the batch size per gpu too i guess. Anyway thanks for this easy to use pathway for finetuning F5.

2 replies

lpscr Oct 23, 2024
Author

@bastiansurya77 if you click auto setting you must get correct setting about batch size per gpu i dont test in colab
try click to see

bastiansurya77 Oct 28, 2024

Yeah it automatically provides me the number, but i needed to lower it because it crashing. How about my former question (how many hours or minutes data you usually use to finetune to produce great result, I want to make the results much closer to mine?)

FrnklyN · 2024-10-23T19:40:00Z

FrnklyN
Oct 23, 2024

Hello @lpscr,

Thank you for sharing this project!

I have been following your instructions:

•	Go to data/project_name/dataset and place all your audio files there (this works for multiple audio files).
•	Click user to accept the dataset for the path.

After placing WAV files in the newly created project folder and checking user, I clicked the Transcribe button and waited for the process to complete.

However, the info field displays “You need to load an audio file.”

Could you please help me resolve this issue?

Thank you!

5 replies

lpscr Oct 23, 2024
Author

@FrnklyN you need to put the file inside to data/project_name/dataset
when Transcribe done create folder wavs with all the files audio and metadata.csv with all transcribe

├── data/
│ └── project_name/
│ ├── dataset/
│ │ └── [Your audio files]

BTW : tomorrow i make big update i change a lot stuff

what you like to use to help you more ?

FrnklyN Oct 23, 2024

It's is... They are in data/voice_pinyin/database.

Or do I need to create a "voice" folder in the data folder?

lpscr Oct 23, 2024
Author

It's is... They are in data/voice_pinyin/database.

Or do I need to create a "voice" folder in the data folder?

no only you need put wave file inside let me check it

lpscr Oct 23, 2024
Author

@FrnklyN yes you right there is small bug i fix you can simple uncheck the user
and click then click in audio and you can select multi data audio
let me know this working for you ?

lpscr Oct 24, 2024
Author

@FrnklyN ok update all make sure you have last update repo you can see in first post update video also

binhphamthanh · 2024-10-26T02:57:14Z

binhphamthanh
Oct 26, 2024

usage: finetune_cli.py [-h] [--exp_name {F5TTS_Base,E2TTS_Base}] [--dataset_name DATASET_NAME]
[--learning_rate LEARNING_RATE] [--batch_size_per_gpu BATCH_SIZE_PER_GPU]
[--batch_size_type {frame,sample}] [--max_samples MAX_SAMPLES]
[--grad_accumulation_steps GRAD_ACCUMULATION_STEPS]
[--max_grad_norm MAX_GRAD_NORM] [--epochs EPOCHS]
[--num_warmup_updates NUM_WARMUP_UPDATES]
[--save_per_updates SAVE_PER_UPDATES] [--last_per_steps LAST_PER_STEPS]
[--finetune FINETUNE] [--pretrain PRETRAIN]
[--tokenizer {pinyin,char,custom}] [--tokenizer_path TOKENIZER_PATH]
finetune_cli.py: error: unrecognized arguments: --file_checkpoint_train

Hello @lpscr,

It appears that the update for the parameter --file_checkpoint_train has not yet been committed and merged into finetune_cli.py, although it has been merged in finetune_gradio.

By the way, is this parameter intended to allow resuming fine-tuning from a previous checkpoint? Could you please provide an example of the exact path to the checkpoint? I assume it would be something like ckpts/project_name/checkpoint.pt, correct?

3 replies

lpscr Oct 26, 2024
Author

hi @binhphamthanh in your ckpts/project_name/

In your ckpts/project_name/, if there is no file named model_last.pt, you won’t be able to continue training. You need to rename your last_checkpoint.pt to model_last.pt.

For example:
model_240000.pt should be renamed to model_last.pt.

When you run the training again, it will continue from where it left off.

I also want to update this soon. Essentially, you need to set the last checkpoint to save the model at smaller intervals. I’ll fix this in the next update.

Please note that the auto-setting is still experimental.

binhphamthanh Oct 26, 2024

Hi @lpscr,
It’s perfect! Thank you very much.

In the meantime, I identified an issue related to the Pretrain Model textbox in the Train Data tab. This field should be labeled as --pretrain in accordance with the latest finetune_cli.py update.

https://github.com/SWivid/F5-TTS/blob/6c623447b8506c23df83c25a00e1f75ef08beb09/src/f5_tts/train/finetune_gradio.py#L323C10-L323C11

lpscr Oct 26, 2024
Author

@binhphamthanh yes i basic you need put the path where the file pt it's and same in gradio finetune use path in checkpoint textbox
and cli call --pretrain my_project

my_project
--- checkpoint.py

osmania101 · 2024-11-02T05:18:21Z

osmania101
Nov 2, 2024

To create a public link, set share=True in launch().
accelerate launch --mixed_precision=fp16 src/f5_tts/train/finetune_cli.py --exp_name F5TTS_Base --learning_rate 1e-05 --batch_size_per_gpu 9600 --batch_size_type frame --max_samples 64 --grad_accumulation_steps 1 --max_grad_norm 1 --epochs 1000 --num_warmup_updates 64 --save_per_updates 126 --last_per_steps 32 --dataset_name first_project --finetune True --tokenizer pinyin --log_samples True --logger tensorboard
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
D:\conda\envs\f5-tts\python.exe: can't open file 'D:\F5-TTS\src\f5_tts\train\src\f5_tts\train\finetune_cli.py': [Errno 2] No such file or directory
Traceback (most recent call last):
File "D:\conda\envs\f5-tts\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\conda\envs\f5-tts\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\conda\envs\F5-tts\Scripts\accelerate.exe_main.py", line 7, in
File "D:\conda\envs\f5-tts\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File "D:\conda\envs\f5-tts\lib\site-packages\accelerate\commands\launch.py", line 1168, in launch_command
simple_launcher(args)
File "D:\conda\envs\f5-tts\lib\site-packages\accelerate\commands\launch.py", line 763, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\conda\envs\f5-tts\python.exe', 'src/f5_tts/train/finetune_cli.py', '--exp_name', 'F5TTS_Base', '--learning_rate', '1e-05', '--batch_size_per_gpu', '9600', '--batch_size_type', 'frame', '--max_samples', '64', '--grad_accumulation_steps', '1', '--max_grad_norm', '1', '--epochs', '1000', '--num_warmup_updates', '64', '--save_per_updates', '126', '--last_per_steps', '32', '--dataset_name', 'first_project', '--finetune', 'True', '--tokenizer', 'pinyin', '--log_samples', 'True', '--logger', 'tensorboard']' returned non-zero exit status 2. Getting this error, file is there but can't do any training.

2 replies

rasheed-aidetic Nov 12, 2024

Hi, Could you please tell me how did you solve this error?

mame82 Nov 12, 2024

The path D:\F5-TTS\src\f5_tts\train\src\f5_tts\train\finetune_cli.py suggests the scripts have a cwd issue. If you invoked from D:\F5-TTS\src\f5_tts\train\src try to run your command from D:\F5-TTS directory instead. As far as I could remember I had similar issues with the working dir. Also run accelerate config upfront to avoid the warning

lpscr · 2024-11-03T10:51:20Z

lpscr
Nov 3, 2024
Author

@osmania101 Can you check if you have the latest update repo ? . Did you follow the correct installation steps at the beginning to install

git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS
# git submodule update --init --recursive  # (optional, if need bigvgan)
pip install -e .

6 replies

lpscr Nov 5, 2024
Author

can you send me some line for the metadata.csv you have to check it

osmania101 Nov 6, 2024

audio_segment_1.wav|He is saying that he is equally happy
audio_segment_2.wav|He was happy about becoming a father to Mici the lion. This was the first occurrence
audio_segment_3.wav|The whole family agreed that he was leading them
audio_segment_4.wav|He knew it wasn't love that was felt towards him , metadata.csv is the problem ?

lpscr Nov 6, 2024
Author

@osmania101 i dont see any problem
check your wavs file in correct path
ex:
wavs/audio_segment_1.wav
wavs/audio_segment_2.wav

osmania101 Nov 7, 2024

Everything is in the correct path as it should be. It only works when I use the transcribe button, but I don't want to use transcription since I already have the metadata.csv. Why can't it detect my audio files directly, even though they work during transcription?

RobertAgee Dec 25, 2024

Did you figure this out? Running into the same problem

Edit: @osmania101 I looked at the gradio code and basically it uses torch to measure the duration of your audio files. If the duration isn't found, it assumes there are no files. You might have a bad install. It doesn't use this method for the transcription tab which is why you didn't experience it there. also pip install soundfile==0.12.1

SOLVED: Besides the above, I also noticed the gradio is putting the /wavs/ folder in the directory path for you. So if your data is labeled:
/wavs/audio1.wav|"blah blah blah"

it will be imported on the path .../wavs/wavs/audio1.wav so be sure to only include the filename

HuuHuy227 · 2024-11-03T13:37:55Z

HuuHuy227
Nov 3, 2024

How can I prepare multi speaker dataset? if i have a multi speaker dataset i should skip speaker id and only keep text and audio pair?

0 replies

ztxxkaty · 2024-11-18T04:12:09Z

ztxxkaty
Nov 18, 2024

Hi @lpscr , it's a fantastic job! Thanks a lot.
I'm now trying to produce the voice that could convey emotion. And I found that in your video, both male and female could speak angrily. Could you please tell me how to achieve this goal? Thanks a lot

1 reply

ztxxkaty Dec 31, 2024

After some testing, I found that if the reference wave is emotionally, then the audition produced will have emotions as well.
If anybody has the same issue, may this tip could help you.

FrnklyN · 2024-11-18T07:08:05Z

FrnklyN
Nov 18, 2024

Hey @lpscr, the tracing works fantastic but I've quite a tracing process. How can I resume it?

0 replies

jala-R · 2024-11-18T15:07:15Z

jala-R
Nov 18, 2024

Hi when i tired to fine tune i got the following error

usage: finetune_cli.py [-h] [--exp_name {F5TTS_Base,E2TTS_Base}]
                       [--dataset_name DATASET_NAME]
                       [--learning_rate LEARNING_RATE]
                       [--batch_size_per_gpu BATCH_SIZE_PER_GPU]
                       [--batch_size_type {frame,sample}]
                       [--max_samples MAX_SAMPLES]
                       [--grad_accumulation_steps GRAD_ACCUMULATION_STEPS]
                       [--max_grad_norm MAX_GRAD_NORM] [--epochs EPOCHS]
                       [--num_warmup_updates NUM_WARMUP_UPDATES]
                       [--save_per_updates SAVE_PER_UPDATES]
                       [--last_per_steps LAST_PER_STEPS] [--finetune FINETUNE]
                       [--pretrain PRETRAIN]
                       [--tokenizer {pinyin,char,custom}]
                       [--tokenizer_path TOKENIZER_PATH]
                       [--log_samples LOG_SAMPLES]
                       [--logger {wandb,tensorboard}]
finetune_cli.py: error: unrecognized arguments: e
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Work Pc\VoiceModel\F5-TTS\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\Users\Work Pc\VoiceModel\F5-TTS\venv\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
    args.func(args)
  File "C:\Users\Work Pc\VoiceModel\F5-TTS\venv\Lib\site-packages\accelerate\commands\launch.py", line 1168, in launch_command
    simple_launcher(args)
  File "C:\Users\Work Pc\VoiceModel\F5-TTS\venv\Lib\site-packages\accelerate\commands\launch.py", line 763, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\Work Pc\\VoiceModel\\F5-TTS\\venv\\Scripts\\python.exe', 'src/f5_tts/train/finetune_cli.py', '--exp_name', 'F5TTS_Base', '--learning_rate', '1e-05', '--batch_size_per_gpu', '1000', '--batch_size_type', 'frame', '--max_samples', '64', '--grad_accumulation_steps', '1', '--max_grad_norm', '1', '--epochs', '150', '--num_warmup_updates', '172', '--save_per_updates', '5000', '--last_per_steps', '1000', '--dataset_name', 'etherreal_seekers', 'e', '--finetune', 'True', '--tokenizer', 'pinyin', '--log_samples', 'True', '--logger', 'tensorboard']' returned non-zero exit status 2.

when i tried removing the arg 'e', then the training is stuck for so long with no logs
"To avoid this warning pass in values for each of the problematic parameters or run accelerate config." stucked here
can anyone help??

0 replies

kimhanzoo · 2024-11-20T09:21:07Z

kimhanzoo
Nov 20, 2024

I encountered a problem that after successfully traning the model and pressing "test model", the output sound was empty, silent, no sound at all. I tried again many times. . please help me

0 replies

MithrilMan · 2024-11-26T22:42:04Z

MithrilMan
Nov 26, 2024

Hi @lpscr ,
thanks for your effort, this is the first time I try to finetune a model so your UI helps a bit.
I'm still trying to understand how to properly build my dataset to finetune in Italian but I've actually setup a runpod server and I'm playing with it.
Currently I'm trying to use the facebook multilingual_librispeech dataset from hf:
https://huggingface.co/datasets/facebook/multilingual_librispeech/viewer/italian

I've created a script and successfully built a metadata.csv and a folder with my wav files, I used the 9 hours split so I'm playing actually with just 9 hours of multispeaker Italian language, is it a fair amount of hours for finetuning?

basicallly I have
~ 2137 samples each one in the range of 10-19 seconds
9 hours of total samples
I'm using a RunPod with a 4090 and I've specified bach size of 3200

I tried 10 epoch but the result isn't good.
I'm now running it for 50, will check it tomorrow, but what I noticed is that if I listen to the sample generated by the "Save per Updates" feature, the result is quite bad.
If I stop the training and use "Test Model" disabling the Ema options, the result is much better.

I don't know EMA here how works, but wouldn't make sense to add an option in "Train Data" page to specify how to generate the sample? This way I can have a better idea of the overall result comparing the samples without using EMA (actually I think it uses EMA by default?)

Feel free to give me any tips to improve results and thanks for the UI :)

P.S.
I want to try this dataset too, it seems to have a better selection from the samples I tried from the dataset:
https://huggingface.co/datasets/ylacombe/cml-tts

3 replies

MithrilMan Nov 26, 2024

let me also post the script I used to generate a folder with wavs and metadata.csv for the facebook dataset, could be useful for others to improve upon (e.g. this doesn't have any pre-processing)

import os
import soundfile as sf
import csv
from datasets import load_dataset

# Load the Italian subset of the Multilingual LibriSpeech dataset
dataset = load_dataset("facebook/multilingual_librispeech", "italian")

# Define the output directory
output_dir = "multilingual_librispeech_italian"
os.makedirs(output_dir, exist_ok=True)

def save_split(split_name, dry_run=False):
    split = dataset[split_name]
    split_dir = os.path.join(output_dir, split_name)
    os.makedirs(split_dir, exist_ok=True)

    wavs_dir = os.path.join(split_dir, "wavs")
    os.makedirs(wavs_dir, exist_ok=True)

    COLUMNS_TO_KEEP = ["transcript", "audio", "sampling_rate"]
    all_columns = split.column_names

    if dry_run:
        print(split)
        columns_to_remove = set(all_columns) - set(COLUMNS_TO_KEEP)
        split = split.remove_columns(columns_to_remove)
        print(split[0])
        return

    columns_to_remove = set(all_columns) - set(COLUMNS_TO_KEEP)
    split = split.remove_columns(columns_to_remove)

    metadata_path = os.path.join(split_dir, "metadata.csv")

    with open(metadata_path, mode='w', newline='', encoding='utf-8') as file:
        writer = csv.writer(file, delimiter='|')

        for i, example in enumerate(split):
            # Extract audio data and sampling rate
            audio = example["audio"]
            audio_array = audio["array"]
            sampling_rate = audio["sampling_rate"]

            # Define file paths
            audio_path = os.path.join(wavs_dir, f"{i}.wav")

            # Save audio file in WAV format
            sf.write(audio_path, audio_array, sampling_rate)

            # Save transcription
            # transcription_path = os.path.join(split_dir, f"{i}.txt")
            # with open(transcription_path, "w", encoding="utf-8") as f:
            #     f.write(example["transcript"])

            # Save metadata
            writer.writerow([f"{i}.wav", example["transcript"]])

# save_split("1_hours", dry_run=True)
save_split("9_hours")

mrceresa Jan 5, 2025

Hi @MithrilMan, I could probably help continuing the italian finetune with your assistance, let me know if you are still interested!

MithrilMan Jan 5, 2025

@mrceresa sure, you can drop me a DM on twitter

jonnytracker · 2024-12-01T12:23:37Z

jonnytracker
Dec 1, 2024

so i need 10 hours of speech for new language ?

0 replies

CJ468 · 2024-12-25T22:29:09Z

CJ468
Dec 25, 2024

Sorry, I don't remember, I only tested the program but tested several others at the same time. Från: Robert Agee ***@***.***> Skickat: den 25 december 2024 22:53 Till: SWivid/F5-TTS ***@***.***> Kopia: CJ468 ***@***.***>; Manual ***@***.***> Ämne: Re: [SWivid/F5-TTS] Gradio interface for fine-tuning or train to make it user-friendly and accessible for beginners, as well as to help the community (Discussion #143) Did you figure this out? Running into the same problem — Reply to this email directly, view it on GitHub <#143 (reply in thread)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHQLEJSRAP24FTPT6WGOWUD2HMSKFAVCNFSM6AAAAABQDSQRPWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNRWGUZDGMI> . You are receiving this because you are subscribed to this thread. <https://github.com/notifications/beacon/AHQLEJSB4Z6TMLHRUMNQSQ32HMSKFA5CNFSM6AAAAABQDSQRPWWGG33NNVSW45C7OR4XAZNRIRUXGY3VONZWS33OINXW23LFNZ2KUY3PNVWWK3TUL5UWJTQAWH7U6.gif> Message ID: ***@***.*** ***@***.***> >

1 reply

RobertAgee Dec 26, 2024

Thanks, I managed to solve it.

danigalron · 2024-12-26T09:31:56Z

danigalron
Dec 26, 2024

Hi! When I want to test the model already retrained and finetuned with new audio samples, it gives me this error:
venv\Lib\site-packages\x_transformers\x_transformers.py", line 685, in apply_rotary_pos_emb
freqs = freqs[:, -seq_len:, :]
~~~~~^^^^^^^^^^^^^^^^^
IndexError: too many indices for tensor of dimension 2

Can you help me please?

2 replies

renkow Jan 6, 2025

have you fixed it?

kurak57 Feb 19, 2025

has anyone fixed it?

wanfnc · 2024-12-30T09:26:59Z

wanfnc
Dec 30, 2024

Hi, @SWivid @lpscr !

First of all, thank you for your amazing work. I’ve been trying to train a Korean TTS model using F5-TTS (multiple times).

Initially, I attempted to train it with a multi-speaker dataset. While there was almost no quality difference between the reference and generated audio during training, the actual inference only managed to copy the voice timbre and failed to read Korean text correctly.

Later, I came across your advice in this discussion and tried training again with a single-speaker dataset of about 12 hours. The training settings were as follows:

Batch size per GPU: 1600
Epochs: 1000
Number of warmup updates: 10000
Save per updates: 25000
Last per steps: 5000
Fine-tune: true
Tokenizer type: char
Mixed precision: bf16
Learning rate: 0.00001

For reference, my GPU is an NVIDIA GeForce RTX 3080 with 10GB of VRAM. ( yeah, it was hard work ^_T )

Below is the training log. Although the training didn’t complete all the set epochs due to my computer overheating, it reached approximately 300K steps.

The inference results from the trained model are as follows:

When the voice used in training is provided as the audio prompt, the model produces decent results.
When another voice is provided as the audio prompt, it fails to mimic the voice characteristics and only shows very subtle changes in the voice trained on. The text conversion also becomes incomplete.

In conclusion, I have a few questions for you:

When training a model with a single speaker, is it inherently limited or impossible to mimic other voices provided as audio prompts?
If so, what would be the minimum number or composition of speakers needed to build a model capable of mimicking diverse voices?
Is it possible to fine-tune the current model further using a dataset of another speaker’s voice?

Thank you!

1 reply

SyamsQ Jan 7, 2025

How to train the model? Do you have a video tutorial for this?

Universeal13 · 2025-01-05T19:29:09Z

Universeal13
Jan 5, 2025

I've never figure it how to train it. Even i tried multiple time the exact order like in the video and other videos. All the time it crashes. So i can't train a new language or any language.

1 reply

SyamsQ Jan 6, 2025

Yes we need explanation. Especially, Indonesian so bad at F5-TTS

przemakk · 2025-01-07T17:27:23Z

przemakk
Jan 7, 2025

Thanks for good work. Everyting works almost fine here, I'm training polish models with success. However, I have a problem with worker processess (maybe a stupid one) - I cannot reduce it from default 16 to 12 (for i7-8700) or 8 (for i7-9700), something is overriding num_workers settings in train.py. This causes problems with bigger datasets. How can I reduce worker number?
Thanks, great job, really!
Przemak.

0 replies

ChrisCodeNation · 2025-01-13T09:38:46Z

ChrisCodeNation
Jan 13, 2025

thanks for this amazing work!!! I followed the same steps as the video but when started training, I got the following messages. Is there anything that I did wrong?

Messages

RuntimeError: Error(s) in loading state_dict for CFM:
Missing key(s) in state_dict: "_flat_param".
Unexpected key(s) in state_dict: "transformer.time_embed.time_mlp.0.weight", "transformer.time_embed.time_mlp.0.bias", "transformer.time_embed.time_mlp.2.weight", "transformer.time_embed.time_mlp.2.bias", "transformer.text_embed.text_embed.weight", "transformer.text_embed.text_blocks.0.dwconv.weight", "transformer.text_embed.text_blocks.0.dwconv.bias"...

0 replies

Dannyahums · 2025-01-14T00:30:15Z

Dannyahums
Jan 14, 2025

Please Help

I keep getting this error. I used a couple of audio samples, used the "Transcribe Data" tab, then the "Prepare Data" tab. When I get to the train tab i get this error.

run command :
accelerate launch --mixed_precision=bf16 D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\train\finetune_cli.py --exp_name F5TTS_Base --learning_rate 1e-05 --batch_size_per_gpu 4800 --batch_size_type frame --max_samples 64 --grad_accumulation_steps 1 --max_grad_norm 1 --epochs 3698 --num_warmup_updates 324 --save_per_updates 648 --last_per_steps 162 --dataset_name Google_Fleurs --finetune --tokenizer pinyin --log_samples --logger tensorboard

The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 2
More than one GPU was found, enabling multi-GPU training.
If this was unintended please pass in --num_processes=1.
--num_machines was set to a value of 1
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
W0113 19:21:59.753000 14700 torch\distributed\elastic\multiprocessing\redirects.py:27] NOTE: Redirects are currently not supported in Windows or MacOs.
[W socket.cpp:697] [c10d] The client socket has failed to connect to [Drakula]:29500 (system error: 10049 - The requested address is not valid in its context.).
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\Drakula\AppData\Local\Temp\jieba.cache
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\Drakula\AppData\Local\Temp\jieba.cache
Word segmentation module jieba initialized.

Word segmentation module jieba initialized.

Loading model cost 0.520 seconds.
Prefix dict has been built successfully.
Loading model cost 0.550 seconds.
Prefix dict has been built successfully.

vocab : 2545

vocoder : vocos
copy checkpoint for finetune

vocab : 2545

vocoder : vocos
Using logger: tensorboard
C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\distributed_c10d.py:613: UserWarning: Attempted to get default timeout for nccl backend, but NCCL support is not compiled
warnings.warn("Attempted to get default timeout for nccl backend, but NCCL support is not compiled")
[W socket.cpp:697] [c10d] The client socket has failed to connect to [Drakula]:29500 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\train\finetune_cli.py", line 172, in
main()
File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\train\finetune_cli.py", line 142, in main
trainer = Trainer(
File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\model\trainer.py", line 60, in init
self.accelerator = Accelerator(
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\accelerate\accelerator.py", line 425, in init
self.state = AcceleratorState(
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\accelerate\state.py", line 861, in init
PartialState(cpu, **kwargs)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\accelerate\state.py", line 212, in init
torch.distributed.init_process_group(backend=self.backend, **kwargs)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\c10d_logger.py", line 75, in wrapper
return func(*args, **kwargs)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\c10d_logger.py", line 89, in wrapper
func_return = func(*args, **kwargs)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\distributed_c10d.py", line 1312, in init_process_group
default_pg, _ = _new_process_group_helper(
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\distributed_c10d.py", line 1513, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
Using logger: tensorboard
C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\distributed_c10d.py:613: UserWarning: Attempted to get default timeout for nccl backend, but NCCL support is not compiled
warnings.warn("Attempted to get default timeout for nccl backend, but NCCL support is not compiled")
[W socket.cpp:697] [c10d] The client socket has failed to connect to [Drakula]:29500 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\train\finetune_cli.py", line 172, in
main()
File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\train\finetune_cli.py", line 142, in main
trainer = Trainer(
File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\model\trainer.py", line 60, in init
self.accelerator = Accelerator(
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\accelerate\accelerator.py", line 425, in init
self.state = AcceleratorState(
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\accelerate\state.py", line 861, in init
PartialState(cpu, **kwargs)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\accelerate\state.py", line 212, in init
torch.distributed.init_process_group(backend=self.backend, **kwargs)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\c10d_logger.py", line 75, in wrapper
return func(*args, **kwargs)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\c10d_logger.py", line 89, in wrapper
func_return = func(*args, **kwargs)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\distributed_c10d.py", line 1312, in init_process_group
default_pg, _ = _new_process_group_helper(
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\distributed_c10d.py", line 1513, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
W0113 19:22:15.362000 14700 torch\distributed\elastic\multiprocessing\api.py:851] Sending process 34196 closing signal CTRL_C_EVENT
Keyboard interruption in main thread... closing server.
W0113 19:22:15.696000 14700 torch\distributed\elastic\agent\server\api.py:741] Received Signals.SIGINT death signal, shutting down workers
Traceback (most recent call last):
File "C:\Users\Drakula.conda\envs\f5-tts\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Drakula.conda\envs\f5-tts\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\Drakula.conda\envs\f5-tts\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\accelerate\commands\launch.py", line 1159, in launch_command
multi_gpu_launcher(args)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\accelerate\commands\launch.py", line 793, in multi_gpu_launcher
distrib_run.run(args)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\run.py", line 870, in run
elastic_launch(
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\launcher\api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\launcher\api.py", line 254, in launch_agent
result = agent.run()
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 123, in wrapper
result = f(*args, **kwargs)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 733, in run
result = self._invoke_run(role)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 877, in _invoke_run
run_result = self._monitor_workers(self._worker_group)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 123, in wrapper
result = f(*args, **kwargs)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\elastic\agent\server\local_elastic_agent.py", line 316, in _monitor_workers
result = self._pcontext.wait(0)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 500, in wait
return self._poll()
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 818, in _poll
self.close() # terminate all running procs
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 541, in close
self._close(death_sig=death_sig, timeout=timeout)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 861, in _close
handler.proc.wait(time_to_wait)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\subprocess.py", line 1209, in wait
return self._wait(timeout=timeout)
File "C:\Users\Drakula.conda\envs\f5-tts\lib\subprocess.py", line 1506, in _wait
result = _winapi.WaitForSingleObject(self._handle,
File "C:\Users\Drakula.conda\envs\f5-tts\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 76, in _terminate_process_handler
raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
torch.distributed.elastic.multiprocessing.api.SignalException: Process 34384 got signal: 2

4 replies

Dannyahums Jan 14, 2025

With the help of chatGBT I got this code to use one of my GPUs

accelerate launch --mixed_precision=bf16 --num_processes=1 --main_process_port=29500 ^
D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\train\finetune_cli.py ^
--exp_name F5TTS_Base --learning_rate 1e-05 --batch_size_per_gpu 4800 --batch_size_type frame ^
--max_samples 64 --grad_accumulation_steps 1 --max_grad_norm 1 --epochs 3698 ^
--num_warmup_updates 324 --save_per_updates 648 --last_per_steps 162 ^
--dataset_name Google_Fleurs --finetune --tokenizer pinyin --log_samples --logger tensorboard

It seems like it is working but then i get this error

D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\model\modules.py:436: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  x = F.scaled_dot_product_attention(query, key, value, attn_mask=attn_mask, dropout_p=0.0, is_causal=False)
Epoch 1/3698:   0%|                                                   | 0/577 [01:59<?, ?step/s]
Traceback (most recent call last):
  File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\train\finetune_cli.py", line 113, in <module>
    main()
  File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\train\finetune_cli.py", line 110, in main
    trainer.train(train_dataset, resumable_with_seed=666)
  File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\model\trainer.py", line 306, in train
    loss, cond, pred = self.model(
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\accelerate\utils\operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\accelerate\utils\operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\model\cfm.py", line 277, in forward
    pred = self.transformer(
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\model\backbones\dit.py", line 169, in forward
    x = block(x, t, mask=mask, rope=rope)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\model\modules.py", line 568, in forward
    norm = self.ff_norm(x) * (1 + scale_mlp[:, None]) + shift_mlp[:, None]
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\nn\modules\normalization.py", line 201, in forward
    return F.layer_norm(
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\nn\functional.py", line 2573, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU

I have tried reducing batch size to as low as 128, switching mixed_precision still no luck

I have an i9-10900k
2 3070s

SWivid Jan 14, 2025
Maintainer

hi, you could check train.py and use hydra config yaml files
which suit for multi-gpu case

Dannyahums Jan 14, 2025

I am new to this. Please how do I do this or is there some resource you could refer me to?

I tried running
accelerate launch --mixed_precision=fp16 src/f5_tts/train/train.py --config-name F5TTS_Small_train.yaml ++datasets.batch_size_per_gpu=128

and got this error

(f5-tts) D:\Document\Projects\F5-TTS\F5-TTS>accelerate launch --mixed_precision=fp16 src/f5_tts/t
rain/train.py --config-name F5TTS_Small_train.yaml ++datasets.batch_size_per_gpu=128
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `2`
                More than one GPU was found, enabling multi-GPU training.
                If this was unintended please pass in `--num_processes=1`.
        `--num_machines` was set to a value of `1`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
W0114 07:59:28.688000 38736 torch\distributed\elastic\multiprocessing\redirects.py:27] NOTE: Redirects are currently not supported in Windows or MacOs.
[W socket.cpp:697] [c10d] The client socket has failed to connect to [Drakula]:29500 (system error: 10049 - The requested address is not valid in its context.).
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\Drakula\AppData\Local\Temp\jieba.cache
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\Drakula\AppData\Local\Temp\jieba.cache
Loading model cost 0.512 seconds.
Prefix dict has been built successfully.
Word segmentation module jieba initialized.

Loading model cost 0.510 seconds.
Prefix dict has been built successfully.
Word segmentation module jieba initialized.

Using logger: None
C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\distributed_c10d.py:613: UserWarning: Attempted to get default timeout for nccl backend, but NCCL support is not compiled
  warnings.warn("Attempted to get default timeout for nccl backend, but NCCL support is not compiled")
Using logger: None
C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\distributed_c10d.py:613: UserWarning: Attempted to get default timeout for nccl backend, but NCCL support is not compiled
  warnings.warn("Attempted to get default timeout for nccl backend, but NCCL support is not compiled")
[W socket.cpp:697] [c10d] The client socket has failed to connect to [Drakula]:29500 (system error: 10049 - The requested address is not valid in its context.).
[W socket.cpp:697] [c10d] The client socket has failed to connect to [Drakula]:29500 (system error: 10049 - The requested address is not valid in its context.).
Error executing job with overrides: ['++datasets.batch_size_per_gpu=128']
Error executing job with overrides: ['++datasets.batch_size_per_gpu=128']
Traceback (most recent call last):
Traceback (most recent call last):
  File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\train\train.py", line 42, in main
    trainer = Trainer(
  File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\train\train.py", line 42, in main
    trainer = Trainer(
  File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\model\trainer.py", line 60, in __init__
    self.accelerator = Accelerator(
  File "D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\model\trainer.py", line 60, in __init__
    self.accelerator = Accelerator(
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\accelerate\accelerator.py", line 425, in __init__
    self.state = AcceleratorState(
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\accelerate\accelerator.py", line 425, in __init__
    self.state = AcceleratorState(
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\accelerate\state.py", line 861, in __init__
    PartialState(cpu, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\accelerate\state.py", line 861, in __init__
    PartialState(cpu, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\accelerate\state.py", line 212, in __init__
    torch.distributed.init_process_group(backend=self.backend, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\accelerate\state.py", line 212, in __init__
    torch.distributed.init_process_group(backend=self.backend, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\c10d_logger.py", line 75, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\c10d_logger.py", line 75, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\c10d_logger.py", line 89, in wrapper
    func_return = func(*args, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\c10d_logger.py", line 89, in wrapper
    func_return = func(*args, **kwargs)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\distributed_c10d.py", line 1312, in init_process_group
    default_pg, _ = _new_process_group_helper(
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\distributed_c10d.py", line 1312, in init_process_group
    default_pg, _ = _new_process_group_helper(
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\distributed_c10d.py", line 1513, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL built in")
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\distributed_c10d.py", line 1513, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
RuntimeError: Distributed package doesn't have NCCL built in

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
W0114 07:59:41.432000 38736 torch\distributed\elastic\multiprocessing\api.py:851] Sending process 34472 closing signal CTRL_C_EVENT
E0114 07:59:41.470000 38736 torch\distributed\elastic\multiprocessing\api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 18396) of binary: C:\Users\Drakula\.conda\envs\f5-tts\python.exe
Traceback (most recent call last):
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\Drakula\.conda\envs\f5-tts\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
    args.func(args)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\accelerate\commands\launch.py", line 1159, in launch_command
    multi_gpu_launcher(args)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\accelerate\commands\launch.py", line 793, in multi_gpu_launcher
    distrib_run.run(args)
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\run.py", line 870, in run
    elastic_launch(
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\launcher\api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "C:\Users\Drakula\.conda\envs\f5-tts\lib\site-packages\torch\distributed\launcher\api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
src/f5_tts/train/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2025-01-14_07:59:41
  host      : Drakula
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 18396)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

SWivid Jan 14, 2025
Maintainer

RuntimeError: Distributed package doesn't have NCCL built in

make sure you have proper install pytorch
see https://github.com/SWivid/F5-TTS?tab=readme-ov-file#installation

also from your previous error message with gradio

D:\Document\Projects\F5-TTS\F5-TTS\src\f5_tts\model\modules.py:436: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
x = F.scaled_dot_product_attention(query, key, value, attn_mask=attn_mask, dropout_p=0.0, is_causal=False)

you should check first for cuda and pytorch installation
then you could probably use gradio to train

if use train.py
basically go with training README

prepare dataset first, or modify exisiting scripts to use google_fleurs https://github.com/SWivid/F5-TTS/tree/main/src/f5_tts/train#prepare-dataset

then you modify training settings in https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/configs/F5TTS_Small_train.yaml if you want to train with small model size for example.

and you config accelerate and start training with https://github.com/SWivid/F5-TTS/tree/main/src/f5_tts/train#1-training-script-used-for-pretrained-model

kdcyberdude · 2025-01-24T19:33:55Z

kdcyberdude
Jan 24, 2025

Has anyone able to train this on Multiple-4090 GPU's setup (2 or more)??

I am getting this - #728 (comment)

0 replies

butterflyoiio · 2025-02-22T13:19:36Z

butterflyoiio
Feb 22, 2025

Hi
I want to use a trained Arabic model along with its vocab file to train more data. I do my audio files in the Transcript tab and then go to the vocab tab and check it because it finds 9 new symbols from my dataset and expands it. Now how should I use this new vocab and Arabic model? Where can I find a guide to fine-tuning the Gradio interface for fine-tuning?

When I set the path in Tokenizer File and Path to the Pretrained Checkpoint I get this error:

Loading model cost 0.440 seconds.
Prefix dict has been built successfully.

vocab : 2580

vocoder : vocos
Using logger: None
Loading dataset ...
Traceback (most recent call last):
File "C:\pinokio\api\e2-f5-tts.git\app\src\f5_tts\model\dataset.py", line 259, in load_dataset
train_dataset = load_from_disk(f"{rel_data_path}/raw")
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\datasets\load.py", line 2207, in load_from_disk
raise FileNotFoundError(f"Directory {dataset_path} not found")
FileNotFoundError: Directory C:\pinokio\api\e2-f5-tts.git\app\src\f5_tts....\data\my_Language_custom/raw not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\pinokio\api\e2-f5-tts.git\app\src\f5_tts\train\finetune_cli.py", line 182, in
main()
File "C:\pinokio\api\e2-f5-tts.git\app\src\f5_tts\train\finetune_cli.py", line 173, in main
train_dataset = load_dataset(args.dataset_name, tokenizer, mel_spec_kwargs=mel_spec_kwargs)
File "C:\pinokio\api\e2-f5-tts.git\app\src\f5_tts\model\dataset.py", line 261, in load_dataset
train_dataset = Dataset_.from_file(f"{rel_data_path}/raw.arrow")
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\datasets\arrow_dataset.py", line 742, in from_file
table = ArrowReader.read_table(filename, in_memory=in_memory)
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\datasets\arrow_reader.py", line 329, in read_table
return table_cls.from_file(filename)
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\datasets\table.py", line 1017, in from_file
table = _memory_mapped_arrow_table_from_file(filename)
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\datasets\table.py", line 63, in _memory_mapped_arrow_table_from_file
opened_stream = _memory_mapped_record_batch_reader_from_file(filename)
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\datasets\table.py", line 48, in _memory_mapped_record_batch_reader_from_file
memory_mapped_stream = pa.memory_map(filename)
File "pyarrow\io.pxi", line 1147, in pyarrow.lib.memory_map
File "pyarrow\io.pxi", line 1094, in pyarrow.lib.MemoryMappedFile._open
File "pyarrow\error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow\error.pxi", line 92, in pyarrow.lib.check_status
FileNotFoundError: [WinError 3] Failed to open local file 'C:/pinokio/api/e2-f5-tts.git/app/src/f5_tts/../../data/my_Language_custom/raw.arrow'. Detail: [Windows error 3] The system cannot find the path specified.

Traceback (most recent call last):
File "C:\pinokio\bin\miniconda\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\pinokio\bin\miniconda\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\pinokio\api\e2-f5-tts.git\app\env\Scripts\accelerate.exe_main.py", line 7, in
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\accelerate\commands\launch.py", line 1172, in launch_command
simple_launcher(args)
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\accelerate\commands\launch.py", line 762, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\pinokio\api\e2-f5-tts.git\app\env\Scripts\python.exe', 'C:\pinokio\api\e2-f5-tts.git\app\src\f5_tts\train\finetune_cli.py', '--exp_name', 'F5TTS_Base', '--learning_rate', '1e-05', '--batch_size_per_gpu', '800', '--batch_size_type', 'frame', '--max_samples', '64', '--grad_accumulation_steps', '1', '--max_grad_norm', '1', '--epochs', '14', '--num_warmup_updates', '66', '--save_per_updates', '500', '--keep_last_n_checkpoints', '1', '--last_per_updates', '200', '--dataset_name', 'my_Language', '--finetune', '--pretrain', 'C:\pinokio\api\e2-f5-tts.git\app\ckpts\model_380000.pt', '--tokenizer_path', 'C:\pinokio\api\e2-f5-tts.git\app\ckpts\vocab.txt', '--tokenizer', 'custom', '--log_samples', '--logger', 'wandb']' returned non-zero exit status 1.

0 replies

Gradio interface for fine-tuning or train to make it user-friendly and accessible for beginners, as well as to help the community #143

Replies: 35 comments · 53 replies

lpscr Oct 18, 2024 Author

lpscr Oct 18, 2024 Author

lpscr Oct 18, 2024 Author

lpscr Oct 18, 2024 Author

lpscr Oct 19, 2024 Author

lpscr Oct 19, 2024 Author

lpscr Oct 19, 2024 Author

lpscr Oct 21, 2024 Author

lpscr Oct 21, 2024 Author

lpscr Oct 21, 2024 Author

lpscr Oct 22, 2024 Author

lpscr Oct 23, 2024 Author

lpscr Oct 23, 2024 Author

lpscr Oct 23, 2024 Author

lpscr Oct 23, 2024 Author

lpscr Oct 24, 2024 Author

lpscr Oct 26, 2024 Author

lpscr Oct 26, 2024 Author

lpscr Nov 3, 2024 Author

lpscr Nov 5, 2024 Author

lpscr Nov 6, 2024 Author

Replies: 35 comments 53 replies

lpscr
Oct 18, 2024
Author

lpscr Oct 18, 2024
Author

lpscr Oct 18, 2024
Author

lpscr
Oct 18, 2024
Author

lpscr Oct 19, 2024
Author

lpscr Oct 19, 2024
Author

lpscr Oct 19, 2024
Author

lpscr Oct 21, 2024
Author

lpscr Oct 21, 2024
Author

lpscr
Oct 21, 2024
Author

lpscr Oct 22, 2024
Author

lpscr Oct 23, 2024
Author

lpscr Oct 23, 2024
Author

lpscr Oct 23, 2024
Author

lpscr Oct 23, 2024
Author

lpscr Oct 24, 2024
Author

lpscr Oct 26, 2024
Author

lpscr Oct 26, 2024
Author

lpscr
Nov 3, 2024
Author

lpscr Nov 5, 2024
Author

lpscr Nov 6, 2024
Author