-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Really awfull training times #45
Comments
I saw this in my output
and it was stuck as well... I fixed it by setting the env var below
|
...and how that can help with my problem? I do not see that in my log anywhere... |
I had the same problem as you. I solved it by updating my graphics card to the latest drivers and making sure to clear my RAM before starting the training process. if this helps |
Thanks, that helped! |
Long training times ? > Update nVidia drivers ! |
==================================================================
S O L V E D - the times of execution are now in normal range! It was neccessary just to update nVidia drivers (facepalm).
Thank you @Docmorfine
==================================================================
16GB | 1024 | GPU usage between 70-100% - repeats & epochs on default (10, 16)
image_count: 5
num_repeats: 10
num epochs: 16
num batches per epoch: 50
total optimization steps: 800
[2024-09-15 01:12:55] [INFO] epoch 1/16 ... 11min
[2024-09-15 01:23:18] [INFO] epoch 2/16 ... 10min
[2024-09-15 01:33:07] [INFO] epoch 3/16 ... 10min
[2024-09-15 01:42:55] [INFO] epoch 4/16 ... so fort and so on ...
[2024-09-15 03:50:23] [INFO] steps: 100%|██████████| 800/800 [2:37:27<00:00, 11.81s/it, avr_loss=0.257]
==================================================================
---------------------------^^^UPDATE 14.9^^^ Update the nVidia drivers--------------------------------
==================================================================
These times shouldn't be real, am I right? All it takes ages... it is just because of my crappy PC, or because I used anaconda with python 10 as a venv or is it because I am using 1024 img size?
my pc - @ GPU 4060Ti 16GB, 64GB ram (32 poss shared) - SET: VRAM - 16GB, img size 1024, win 10, venv-anaconda python 10
one full log at the bottom of this post
Question: Am I doing something wrong? Or it is normal due to circumstances? {pc, lora setup....}
16GB | 1024 | GPU usage between 70-100% - repeats & epochs on default (10, 16)
image_count: 5
num_repeats: 10
num epochs: 16
num batches per epoch: 50
total optimization steps: 800
[2024-09-12 11:13:46] [INFO] epoch 1/16 ... 2h 30min
[2024-09-12 13:43:24] [INFO] epoch 2/16 ... I just ended it.... it is just a test lora and waiting 36 hours .....
16GB | 1024 | GPU usage between 70-100%
image_count: 2
num_repeats: 5
num epochs: 4
num batches per epoch: 10
total optimization steps: 40
[2024-09-12 08:49:22] [INFO] epoch 1/4 ... 30min
[2024-09-12 09:19:57] [INFO] epoch 2/4 ... 30min
[2024-09-12 09:50:16] [INFO] epoch 3/4 ... 30min
[2024-09-12 10:20:10] [INFO] epoch 4/4 ... 29min
[2024-09-12 10:49:30] [INFO] Command exited successfully ... 2h
----------------^^^UPDATE - git pull of gym and git pull of sd-script12.9^^^----------------------------------
16GB | 1024 | GPU usage between 70-100%
image_count: 5
num_repeats: 5
num epochs: 4
num batches per epoch: 25
total optimization steps: 100
[2024-09-11 09:26:40] [INFO] epoch 1/4 ... 2h 46min
[2024-09-11 12:12:28] [INFO] epoch 2/4 ... 2h 45min
[2024-09-11 14:57:16] [INFO] epoch 3/4 ... 2h 45min
[2024-09-11 17:42:56] [INFO] epoch 4/4 ... took longer becaue I did another stuff on PC ...
16GB | 1024 | GPU usage between 70-100%
image_count: 1
num_repeats: 5
num epochs: 4
num batches per epoch: 5
total optimization steps: 25
[2024-09-11 06:35:40] [INFO] epoch 1/4 ... 39min
[2024-09-11 07:14:39] [INFO] epoch 2/4 ... 41min
[2024-09-11 07:55:27] [INFO] epoch 3/4 ... 40min
[2024-09-11 08:35:50] [INFO] epoch 4/4 ... 40min
[2024-09-11 09:15:35] [INFO] Command exited successfully ... 2h 40min
THE ORIGINAL POST:
So I tried that through the git clone install, prepared 57 of images, managed to correct the florence2 results in the UI and finaly get it trained...it was... yesterday...
@ GPU 4060Ti 16GB, 64GB ram (32 poss shared) - SET: VRAM - 16GB ; images *1024 ;
image_count: 57
num_repeats: 10
num epochs: 8
num batches per epoch: 570
total optimization steps: 4560
[2024-09-09 15:45:06] [INFO] epoch 1/8
[2024-09-09 15:45:18] [INFO] 2024-09-09 15:45:18 INFO epoch is incremented. train_util.py:668
[2024-09-09 15:45:18] [INFO] current_epoch: 0, epoch: 1
[2024-09-09 15:45:18] [INFO] 2024-09-09 15:45:18 INFO epoch is incremented. train_util.py:668
[2024-09-09 15:45:18] [INFO] current_epoch: 0, epoch: 1
...and now it is 17:15 !!! the day after !!! ... and still frozen there...
Should I terminate it?
it also looks like it uses only 40% of the GPU, eventhough the GPU memory is fully used. The 'activity' (40%) occurs only when I'm switched on that gradio tab in browser. whenever I do something else, the GPU drops back somewhere ~ 1%...
here is a full log:
[2024-09-09 15:40:46] [INFO] Running d:\fluxgym\train.bat
[2024-09-09 15:40:46] [INFO] (fluxgym) d:\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "d:\fluxgym\models\unet\flux1-dev.sft" --clip_l "d:\fluxgym\models\clip\clip_l.safetensors" --t5xxl "d:\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "d:\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 4 --optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" --lr_scheduler constant_with_warmup --max_grad_norm 0.0 --learning_rate 8e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 8 --save_every_n_epochs 2 --dataset_config "d:\fluxgym\dataset.toml" --output_dir "d:\fluxgym\outputs" --output_name looora-001 --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2
[2024-09-09 15:40:53] [INFO] The following values were not passed to
accelerate launch
and had defaults used instead:[2024-09-09 15:40:53] [INFO]
--num_processes
was set to a value of1
[2024-09-09 15:40:53] [INFO]
--num_machines
was set to a value of1
[2024-09-09 15:40:53] [INFO]
--dynamo_backend
was set to a value of'no'
[2024-09-09 15:40:53] [INFO] To avoid this warning pass in values for each of the problematic parameters or run
accelerate config
.[2024-09-09 15:40:59] [INFO] highvram is enabled / highvramが有効です
[2024-09-09 15:40:59] [INFO] 2024-09-09 15:40:59 WARNING cache_latents_to_disk is train_util.py:3896
[2024-09-09 15:40:59] [INFO] enabled, so cache_latents is
[2024-09-09 15:40:59] [INFO] also enabled /
[2024-09-09 15:40:59] [INFO] cache_latents_to_diskが有効なた
[2024-09-09 15:40:59] [INFO] め、cache_latentsを有効にします
[2024-09-09 15:40:59] [INFO] 2024-09-09 15:40:59 INFO t5xxl_max_token_length: flux_train_network.py:155
[2024-09-09 15:40:59] [INFO] 512
[2024-09-09 15:41:02] [INFO] C:\Users\1\anaconda3\envs\fluxgym\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning:
clean_up_tokenization_spaces
was not set. It will be set toTrue
by default. This behavior will be depracted in transformers v4.45, and will be then set toFalse
by default. For more details check this issue: huggingface/transformers#31884[2024-09-09 15:41:02] [INFO] warnings.warn(
[2024-09-09 15:41:04] [INFO] You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the
legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565[2024-09-09 15:41:04] [INFO] 2024-09-09 15:41:04 INFO Loading dataset config from train_network.py:280
[2024-09-09 15:41:04] [INFO] d:\fluxgym\dataset.toml
[2024-09-09 15:41:04] [INFO] INFO prepare images. train_util.py:1803
[2024-09-09 15:41:04] [INFO] INFO get image size from name of train_util.py:1741
[2024-09-09 15:41:04] [INFO] cache files
[2024-09-09 15:41:04] [INFO] 0%| | 0/57 [00:00<?, ?it/s]
100%|██████████| 57/57 [00:00<00:00, 2035.76it/s]
[2024-09-09 15:41:04] [INFO] INFO set image size from cache train_util.py:1748
[2024-09-09 15:41:04] [INFO] files: 0/57
[2024-09-09 15:41:04] [INFO] INFO found directory train_util.py:1750
[2024-09-09 15:41:04] [INFO] d:\fluxgym\datasets\looora-thunb
[2024-09-09 15:41:04] [INFO] erg contains 57 image files
[2024-09-09 15:41:04] [INFO] INFO 570 train images with train_util.py:1844
[2024-09-09 15:41:04] [INFO] repeating.
[2024-09-09 15:41:04] [INFO] INFO 0 reg images. train_util.py:1847
[2024-09-09 15:41:04] [INFO] WARNING no regularization images / train_util.py:1852
[2024-09-09 15:41:04] [INFO] 正則化画像が見つかりませんでし
[2024-09-09 15:41:04] [INFO] た
[2024-09-09 15:41:04] [INFO] INFO [Dataset 0] config_util.py:570
[2024-09-09 15:41:04] [INFO] batch_size: 1
[2024-09-09 15:41:04] [INFO] resolution: (1024, 1024)
[2024-09-09 15:41:04] [INFO] enable_bucket: False
[2024-09-09 15:41:04] [INFO] network_multiplier: 1.0
[2024-09-09 15:41:04] [INFO]
[2024-09-09 15:41:04] [INFO] [Subset 0 of Dataset 0]
[2024-09-09 15:41:04] [INFO] image_dir:
[2024-09-09 15:41:04] [INFO] "d:\fluxgym\datasets\looora-thun
[2024-09-09 15:41:04] [INFO] berg"
[2024-09-09 15:41:04] [INFO] image_count: 57
[2024-09-09 15:41:04] [INFO] num_repeats: 10
[2024-09-09 15:41:04] [INFO] shuffle_caption: False
[2024-09-09 15:41:04] [INFO] keep_tokens: 1
[2024-09-09 15:41:04] [INFO] keep_tokens_separator:
[2024-09-09 15:41:04] [INFO] caption_separator: ,
[2024-09-09 15:41:04] [INFO] secondary_separator: None
[2024-09-09 15:41:04] [INFO] enable_wildcard: False
[2024-09-09 15:41:04] [INFO] caption_dropout_rate: 0.0
[2024-09-09 15:41:04] [INFO] caption_dropout_every_n_epo
[2024-09-09 15:41:04] [INFO] ches: 0
[2024-09-09 15:41:04] [INFO] caption_tag_dropout_rate:
[2024-09-09 15:41:04] [INFO] 0.0
[2024-09-09 15:41:04] [INFO] caption_prefix: None
[2024-09-09 15:41:04] [INFO] caption_suffix: None
[2024-09-09 15:41:04] [INFO] color_aug: False
[2024-09-09 15:41:04] [INFO] flip_aug: False
[2024-09-09 15:41:04] [INFO] face_crop_aug_range: None
[2024-09-09 15:41:04] [INFO] random_crop: False
[2024-09-09 15:41:04] [INFO] token_warmup_min: 1,
[2024-09-09 15:41:04] [INFO] token_warmup_step: 0,
[2024-09-09 15:41:04] [INFO] alpha_mask: False,
[2024-09-09 15:41:04] [INFO] is_reg: False
[2024-09-09 15:41:04] [INFO] class_tokens: AAbb
[2024-09-09 15:41:04] [INFO] caption_extension: .txt
[2024-09-09 15:41:04] [INFO] INFO [Dataset 0] config_util.py:576
[2024-09-09 15:41:04] [INFO] INFO loading image sizes. train_util.py:876
[2024-09-09 15:41:05] [INFO] 0%| | 0/57 [00:00<?, ?it/s]
42%|████▏ | 24/57 [00:00<00:00, 235.29it/s]
84%|████████▍ | 48/57 [00:00<00:00, 235.29it/s]
100%|██████████| 57/57 [00:00<00:00, 235.53it/s]
[2024-09-09 15:41:05] [INFO] 2024-09-09 15:41:05 INFO prepare dataset train_util.py:884
[2024-09-09 15:41:05] [INFO] INFO preparing accelerator train_network.py:345
[2024-09-09 15:41:05] [INFO] accelerator device: cuda
[2024-09-09 15:41:05] [INFO] INFO Building Flux model dev flux_utils.py:45
[2024-09-09 15:41:05] [INFO] INFO Loading state dict from flux_utils.py:52
[2024-09-09 15:41:05] [INFO] d:\fluxgym\models\unet\flux1-dev.
[2024-09-09 15:41:05] [INFO] sft
[2024-09-09 15:41:06] [INFO] 2024-09-09 15:41:06 INFO Loaded Flux: <All keys matched flux_utils.py:55
[2024-09-09 15:41:06] [INFO] successfully>
[2024-09-09 15:41:06] [INFO] INFO Building CLIP flux_utils.py:74
[2024-09-09 15:41:06] [INFO] INFO Loading state dict from flux_utils.py:167
[2024-09-09 15:41:06] [INFO] d:\fluxgym\models\clip\clip_l.sa
[2024-09-09 15:41:06] [INFO] fetensors
[2024-09-09 15:41:06] [INFO] INFO Loaded CLIP: <All keys matched flux_utils.py:170
[2024-09-09 15:41:06] [INFO] successfully>
[2024-09-09 15:41:06] [INFO] INFO Loading state dict from flux_utils.py:215
[2024-09-09 15:41:06] [INFO] d:\fluxgym\models\clip\t5xxl_fp1
[2024-09-09 15:41:06] [INFO] 6.safetensors
[2024-09-09 15:41:06] [INFO] INFO Loaded T5xxl: <All keys matched flux_utils.py:218
[2024-09-09 15:41:06] [INFO] successfully>
[2024-09-09 15:41:06] [INFO] INFO Building AutoEncoder flux_utils.py:62
[2024-09-09 15:41:06] [INFO] INFO Loading state dict from flux_utils.py:66
[2024-09-09 15:41:06] [INFO] d:\fluxgym\models\vae\ae.sft
[2024-09-09 15:41:06] [INFO] INFO Loaded AE: <All keys matched flux_utils.py:69
[2024-09-09 15:41:06] [INFO] successfully>
[2024-09-09 15:41:06] [INFO] import network module: networks.lora_flux
[2024-09-09 15:41:07] [INFO] 2024-09-09 15:41:07 INFO [Dataset 0] train_util.py:2324
[2024-09-09 15:41:07] [INFO] INFO caching latents with caching train_util.py:984
[2024-09-09 15:41:07] [INFO] strategy.
[2024-09-09 15:41:07] [INFO] INFO checking cache validity... train_util.py:994
[2024-09-09 15:41:07] [INFO] 0%| | 0/57 [00:00<?, ?it/s]
100%|██████████| 57/57 [00:00<00:00, 28505.46it/s]
[2024-09-09 15:41:07] [INFO] INFO caching latents... train_util.py:1038
[2024-09-09 15:41:27] [INFO] 0%| | 0/57 [00:00<?, ?it/s]
2%|▏ | 1/57 [00:00<00:37, 1.50it/s]
4%|▎ | 2/57 [00:00<00:18, 3.05it/s]
5%|▌ | 3/57 [00:01<00:17, 3.05it/s]
7%|▋ | 4/57 [00:01<00:17, 3.06it/s]
9%|▉ | 5/57 [00:01<00:17, 3.02it/s]
11%|█ | 6/57 [00:02<00:16, 3.06it/s]
12%|█▏ | 7/57 [00:02<00:16, 3.03it/s]
14%|█▍ | 8/57 [00:02<00:16, 3.01it/s]
16%|█▌ | 9/57 [00:03<00:15, 3.06it/s]
18%|█▊ | 10/57 [00:03<00:15, 3.07it/s]
19%|█▉ | 11/57 [00:03<00:14, 3.09it/s]
21%|██ | 12/57 [00:04<00:14, 3.02it/s]
23%|██▎ | 13/57 [00:04<00:14, 3.06it/s]
25%|██▍ | 14/57 [00:04<00:14, 2.98it/s]
26%|██▋ | 15/57 [00:05<00:14, 2.99it/s]
28%|██▊ | 16/57 [00:05<00:13, 3.06it/s]
30%|██▉ | 17/57 [00:05<00:13, 3.01it/s]
32%|███▏ | 18/57 [00:06<00:13, 2.99it/s]
33%|███▎ | 19/57 [00:06<00:12, 3.01it/s]
35%|███▌ | 20/57 [00:06<00:12, 3.03it/s]
37%|███▋ | 21/57 [00:07<00:11, 3.01it/s]
39%|███▊ | 22/57 [00:07<00:11, 2.99it/s]
40%|████ | 23/57 [00:07<00:11, 3.00it/s]
42%|████▏ | 24/57 [00:08<00:11, 2.98it/s]
44%|████▍ | 25/57 [00:08<00:10, 2.97it/s]
46%|████▌ | 26/57 [00:08<00:10, 2.99it/s]
47%|████▋ | 27/57 [00:09<00:09, 3.01it/s]
49%|████▉ | 28/57 [00:09<00:09, 2.98it/s]
51%|█████ | 29/57 [00:09<00:09, 2.98it/s]
53%|█████▎ | 30/57 [00:10<00:09, 2.96it/s]
54%|█████▍ | 31/57 [00:10<00:08, 3.02it/s]
56%|█████▌ | 32/57 [00:10<00:08, 2.99it/s]
58%|█████▊ | 33/57 [00:11<00:08, 2.99it/s]
60%|█████▉ | 34/57 [00:11<00:07, 3.01it/s]
61%|██████▏ | 35/57 [00:11<00:07, 2.97it/s]
63%|██████▎ | 36/57 [00:12<00:07, 2.90it/s]
65%|██████▍ | 37/57 [00:12<00:06, 2.99it/s]
67%|██████▋ | 38/57 [00:12<00:06, 3.01it/s]
68%|██████▊ | 39/57 [00:13<00:06, 2.97it/s]
70%|███████ | 40/57 [00:13<00:05, 3.00it/s]
72%|███████▏ | 41/57 [00:13<00:05, 2.94it/s]
74%|███████▎ | 42/57 [00:14<00:04, 3.02it/s]
75%|███████▌ | 43/57 [00:14<00:04, 3.01it/s]
77%|███████▋ | 44/57 [00:14<00:04, 3.02it/s]
79%|███████▉ | 45/57 [00:15<00:04, 2.98it/s]
81%|████████ | 46/57 [00:15<00:03, 2.98it/s]
82%|████████▏ | 47/57 [00:15<00:03, 3.01it/s]
84%|████████▍ | 48/57 [00:16<00:02, 3.03it/s]
86%|████████▌ | 49/57 [00:16<00:02, 2.95it/s]
88%|████████▊ | 50/57 [00:16<00:02, 2.73it/s]
89%|████████▉ | 51/57 [00:17<00:01, 3.04it/s]
91%|█████████ | 52/57 [00:17<00:01, 3.00it/s]
93%|█████████▎| 53/57 [00:18<00:01, 2.95it/s]
95%|█████████▍| 54/57 [00:18<00:01, 2.92it/s]
96%|█████████▋| 55/57 [00:18<00:00, 2.97it/s]
98%|█████████▊| 56/57 [00:19<00:00, 2.95it/s]
100%|██████████| 57/57 [00:19<00:00, 3.03it/s]
100%|██████████| 57/57 [00:19<00:00, 2.95it/s]
[2024-09-09 15:41:27] [INFO] 2024-09-09 15:41:27 INFO move vae and unet to cpu flux_train_network.py:208
[2024-09-09 15:41:27] [INFO] to save memory
[2024-09-09 15:41:27] [INFO] INFO move text encoders to flux_train_network.py:216
[2024-09-09 15:41:27] [INFO] gpu
[2024-09-09 15:41:53] [INFO] 2024-09-09 15:41:53 INFO [Dataset 0] train_util.py:2345
[2024-09-09 15:41:53] [INFO] INFO caching Text Encoder outputs train_util.py:1107
[2024-09-09 15:41:53] [INFO] with caching strategy.
[2024-09-09 15:41:53] [INFO] INFO checking cache validity... train_util.py:1113
[2024-09-09 15:41:53] [INFO] 0%| | 0/57 [00:00<?, ?it/s]
100%|██████████| 57/57 [00:00<00:00, 57058.55it/s]
[2024-09-09 15:41:53] [INFO] INFO caching Text Encoder outputs... train_util.py:1139
[2024-09-09 15:42:09] [INFO] 0%| | 0/57 [00:00<?, ?it/s]
2%|▏ | 1/57 [00:01<01:04, 1.15s/it]
4%|▎ | 2/57 [00:01<00:14, 3.89it/s]
5%|▌ | 3/57 [00:01<00:14, 3.80it/s]
7%|▋ | 4/57 [00:01<00:13, 3.88it/s]
9%|▉ | 5/57 [00:02<00:13, 3.89it/s]
11%|█ | 6/57 [00:02<00:13, 3.88it/s]
12%|█▏ | 7/57 [00:02<00:13, 3.83it/s]
14%|█▍ | 8/57 [00:02<00:12, 3.85it/s]
16%|█▌ | 9/57 [00:03<00:12, 3.85it/s]
18%|█▊ | 10/57 [00:03<00:12, 3.89it/s]
19%|█▉ | 11/57 [00:03<00:12, 3.83it/s]
21%|██ | 12/57 [00:04<00:11, 3.86it/s]
23%|██▎ | 13/57 [00:04<00:11, 3.82it/s]
25%|██▍ | 14/57 [00:04<00:11, 3.86it/s]
26%|██▋ | 15/57 [00:04<00:10, 3.86it/s]
28%|██▊ | 16/57 [00:05<00:10, 3.85it/s]
30%|██▉ | 17/57 [00:05<00:10, 3.88it/s]
32%|███▏ | 18/57 [00:05<00:10, 3.88it/s]
33%|███▎ | 19/57 [00:05<00:09, 3.85it/s]
35%|███▌ | 20/57 [00:06<00:09, 3.89it/s]
37%|███▋ | 21/57 [00:06<00:09, 3.87it/s]
39%|███▊ | 22/57 [00:06<00:09, 3.85it/s]
40%|████ | 23/57 [00:06<00:08, 3.89it/s]
42%|████▏ | 24/57 [00:07<00:08, 3.89it/s]
44%|████▍ | 25/57 [00:07<00:08, 3.82it/s]
46%|████▌ | 26/57 [00:07<00:08, 3.85it/s]
47%|████▋ | 27/57 [00:07<00:07, 3.83it/s]
49%|████▉ | 28/57 [00:08<00:07, 3.88it/s]
51%|█████ | 29/57 [00:08<00:07, 3.88it/s]
53%|█████▎ | 30/57 [00:08<00:06, 3.91it/s]
54%|█████▍ | 31/57 [00:08<00:06, 3.91it/s]
56%|█████▌ | 32/57 [00:09<00:06, 3.88it/s]
58%|█████▊ | 33/57 [00:09<00:06, 3.88it/s]
60%|█████▉ | 34/57 [00:09<00:05, 3.90it/s]
61%|██████▏ | 35/57 [00:09<00:05, 3.86it/s]
63%|██████▎ | 36/57 [00:10<00:05, 3.89it/s]
65%|██████▍ | 37/57 [00:10<00:05, 3.86it/s]
67%|██████▋ | 38/57 [00:10<00:04, 3.88it/s]
68%|██████▊ | 39/57 [00:10<00:04, 3.86it/s]
70%|███████ | 40/57 [00:11<00:04, 3.86it/s]
72%|███████▏ | 41/57 [00:11<00:04, 3.86it/s]
74%|███████▎ | 42/57 [00:11<00:03, 3.91it/s]
75%|███████▌ | 43/57 [00:12<00:03, 3.86it/s]
77%|███████▋ | 44/57 [00:12<00:03, 3.89it/s]
79%|███████▉ | 45/57 [00:12<00:03, 3.83it/s]
81%|████████ | 46/57 [00:12<00:02, 3.91it/s]
82%|████████▏ | 47/57 [00:13<00:02, 3.89it/s]
84%|████████▍ | 48/57 [00:13<00:02, 3.91it/s]
86%|████████▌ | 49/57 [00:13<00:02, 3.92it/s]
88%|████████▊ | 50/57 [00:13<00:01, 3.86it/s]
89%|████████▉ | 51/57 [00:14<00:01, 3.85it/s]
91%|█████████ | 52/57 [00:14<00:01, 3.91it/s]
93%|█████████▎| 53/57 [00:14<00:01, 3.91it/s]
95%|█████████▍| 54/57 [00:14<00:00, 3.88it/s]
96%|█████████▋| 55/57 [00:15<00:00, 3.85it/s]
98%|█████████▊| 56/57 [00:15<00:00, 3.86it/s]
100%|██████████| 57/57 [00:15<00:00, 3.83it/s]
100%|██████████| 57/57 [00:15<00:00, 3.65it/s]
[2024-09-09 15:42:09] [INFO] 2024-09-09 15:42:09 INFO move t5XXL back to cpu flux_train_network.py:256
[2024-09-09 15:42:13] [INFO] 2024-09-09 15:42:13 INFO move vae and unet back flux_train_network.py:261
[2024-09-09 15:42:13] [INFO] to original device
[2024-09-09 15:42:13] [INFO] INFO create LoRA network. base dim lora_flux.py:484
[2024-09-09 15:42:13] [INFO] (rank): 4, alpha: 1
[2024-09-09 15:42:13] [INFO] INFO neuron dropout: p=None, rank lora_flux.py:485
[2024-09-09 15:42:13] [INFO] dropout: p=None, module dropout:
[2024-09-09 15:42:13] [INFO] p=None
[2024-09-09 15:42:13] [INFO] INFO train all blocks only lora_flux.py:495
[2024-09-09 15:42:13] [INFO] INFO create LoRA for Text Encoder 1: lora_flux.py:576
[2024-09-09 15:42:13] [INFO] INFO create LoRA for Text Encoder 1: lora_flux.py:579
[2024-09-09 15:42:13] [INFO] 72 modules.
[2024-09-09 15:42:14] [INFO] 2024-09-09 15:42:14 INFO create LoRA for FLUX all blocks: lora_flux.py:593
[2024-09-09 15:42:14] [INFO] 304 modules.
[2024-09-09 15:42:14] [INFO] INFO enable LoRA for text encoder: 72 lora_flux.py:736
[2024-09-09 15:42:14] [INFO] modules
[2024-09-09 15:42:14] [INFO] INFO enable LoRA for U-Net: 304 lora_flux.py:741
[2024-09-09 15:42:14] [INFO] modules
[2024-09-09 15:42:14] [INFO] FLUX: Gradient checkpointing enabled. CPU offload: False
[2024-09-09 15:42:14] [INFO] prepare optimizer, data loader etc.
[2024-09-09 15:42:14] [INFO] INFO use Adafactor optimizer | train_util.py:4501
[2024-09-09 15:42:14] [INFO] {'relative_step': False,
[2024-09-09 15:42:14] [INFO] 'scale_parameter': False,
[2024-09-09 15:42:14] [INFO] 'warmup_init': False}
[2024-09-09 15:42:14] [INFO] override steps. steps for 8 epochs is / 指定エポックまでのステップ数: 4560
[2024-09-09 15:42:14] [INFO] enable fp8 training for U-Net.
[2024-09-09 15:42:14] [INFO] enable fp8 training for Text Encoder.
[2024-09-09 15:44:08] [INFO] 2024-09-09 15:44:08 INFO prepare CLIP-L for fp8: flux_train_network.py:464
[2024-09-09 15:44:08] [INFO] set to
[2024-09-09 15:44:08] [INFO] torch.float8_e4m3fn, set
[2024-09-09 15:44:08] [INFO] embeddings to
[2024-09-09 15:44:08] [INFO] torch.bfloat16
[2024-09-09 15:44:08] [INFO] running training / 学習開始
[2024-09-09 15:44:08] [INFO] num train images * repeats / 学習画像の数×繰り返し回数: 570
[2024-09-09 15:44:08] [INFO] num reg images / 正則化画像の数: 0
[2024-09-09 15:44:08] [INFO] num batches per epoch / 1epochのバッチ数: 570
[2024-09-09 15:44:08] [INFO] num epochs / epoch数: 8
[2024-09-09 15:44:08] [INFO] batch size per device / バッチサイズ: 1
[2024-09-09 15:44:08] [INFO] gradient accumulation steps / 勾配を合計するステップ数 = 1
[2024-09-09 15:44:08] [INFO] total optimization steps / 学習ステップ数: 4560
[2024-09-09 15:45:06] [INFO] steps: 0%| | 0/4560 [00:00<?, ?it/s]2024-09-09 15:45:06 INFO unet dtype: train_network.py:1046
[2024-09-09 15:45:06] [INFO] torch.float8_e4m3fn, device:
[2024-09-09 15:45:06] [INFO] cuda:0
[2024-09-09 15:45:06] [INFO] INFO text_encoder [0] dtype: train_network.py:1052
[2024-09-09 15:45:06] [INFO] torch.float8_e4m3fn, device:
[2024-09-09 15:45:06] [INFO] cuda:0
[2024-09-09 15:45:06] [INFO] INFO text_encoder [1] dtype: train_network.py:1052
[2024-09-09 15:45:06] [INFO] torch.bfloat16, device: cpu
[2024-09-09 15:45:06] [INFO]
[2024-09-09 15:45:06] [INFO] epoch 1/8
[2024-09-09 15:45:18] [INFO] 2024-09-09 15:45:18 INFO epoch is incremented. train_util.py:668
[2024-09-09 15:45:18] [INFO] current_epoch: 0, epoch: 1
[2024-09-09 15:45:18] [INFO] 2024-09-09 15:45:18 INFO epoch is incremented. train_util.py:668
[2024-09-09 15:45:18] [INFO] current_epoch: 0, epoch: 1
The text was updated successfully, but these errors were encountered: