Given groups=1, weight of size [1536, 16, 2, 2], expected input[4, 4, 128, 96] to have 16 channels, but got 4 channels instead #1419

hieusttruyen · 2024-07-09T07:00:06Z

Loading settings from /content/fine_tune/config/config_file.toml...
/content/fine_tune/config/config_file
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in huggingface/transformers#24565
Training with captions.
loading existing metadata: /content/fine_tune/meta_lat.json
using bucket info in metadata / メタデータ内のbucket情報を使います
[Dataset 0]
batch_size: 4
resolution: (1024, 1024)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: None
max_bucket_reso: None
bucket_reso_steps: None
bucket_no_upscale: None

[Subset 0 of Dataset 0]
image_dir: "/content/fine_tune/train_data"
image_count: 30
num_repeats: 20
shuffle_caption: False
keep_tokens: 0
keep_tokens_separator:
caption_separator: ,
secondary_separator: None
enable_wildcard: False
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
alpha_mask: False,
metadata_file: /content/fine_tune/meta_lat.json

[Dataset 0]
loading image sizes.
100% 30/30 [00:00<00:00, 691368.79it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (512, 1024), count: 60
bucket 1: resolution (576, 1024), count: 280
bucket 2: resolution (704, 1024), count: 40
bucket 3: resolution (768, 1024), count: 100
bucket 4: resolution (832, 1024), count: 40
bucket 5: resolution (1024, 704), count: 20
bucket 6: resolution (1024, 768), count: 20
bucket 7: resolution (1024, 1024), count: 40
mean ar error (without repeats): 0.0
prepare accelerator
accelerator device: cuda
Loading SD3 models from /content/pretrained_model/sd3_medium.safetensors
loading model for process 0/1
Building VAE
Loading state dict...
Loaded VAE:
[Dataset 0]
caching latents.
checking cache validity...
100% 30/30 [00:00<00:00, 554313.30it/s]
caching latents...
0it [00:00, ?it/s]
loading model for process 0/1
Loading clip_l from /content/pretrained_model/clip_l.safetensors...
Building ClipL
Loading state dict...
Loaded ClipL:
loading model for process 0/1
Loading clip_g from /content/pretrained_model/clip_g.safetensors...
Building ClipG
Loading state dict...
Loaded ClipG:
loading model for process 0/1
Loading t5xxl from /content/pretrained_model/t5xxl_fp16.safetensors...
Building T5XXL
Loading state dict...
Loaded T5XXL:
[Dataset 0]
caching text encoder outputs.
checking cache existence...
100% 30/30 [00:00<00:00, 134146.18it/s]
caching text encoder outputs...
0it [00:00, ?it/s]
loading model for process 0/1
Building MMDit
Loading state dict...
Loaded MMDiT:
train mmdit: True
number of models: 1
number of trainable parameters: 2028328000
prepare optimizer, data loader etc.
use Adafactor optimizer | {'scale_parameter': False, 'relative_step': False, 'warmup_init': False}
constant_with_warmup will be good / スケジューラはconstant_with_warmupが良いかもしれません
running training / 学習開始
num examples / サンプル数: 600
num batches per epoch / 1epochのバッチ数: 150
num epochs / epoch数: 53
batch size per device / バッチサイズ: 4
gradient accumulation steps / 勾配を合計するステップ数 = 4
total optimization steps / 学習ステップ数: 2014
steps: 0% 0/2014 [00:00<?, ?it/s]
epoch 1/53
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
Traceback (most recent call last):
File "/content/kohya-trainer/sd3_train.py", line 974, in
train(args)
File "/content/kohya-trainer/sd3_train.py", line 750, in train
model_pred = mmdit(noisy_model_input, timesteps, context=context, y=pool)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 680, in forward
return model_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 668, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/content/kohya-trainer/library/sd3_models.py", line 998, in forward
x = self.x_embedder(x) + self.cropped_pos_embed(H, W, device=x.device).to(dtype=x.dtype)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/content/kohya-trainer/library/sd3_models.py", line 298, in forward
x = self.proj(x)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [1536, 16, 2, 2], expected input[4, 4, 128, 96] to have 16 channels, but got 4 channels instead
steps: 0% 0/2014 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1017, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'sd3_train.py', '--sample_prompts=/content/fine_tune/config/sample_prompt.toml', '--config_file=/content/fine_tune/config/config_file.toml', '--clip_l=/content/pretrained_model/clip_l.safetensors', '--clip_g=/content/pretrained_model/clip_g.safetensors', '--t5xxl=/content/pretrained_model/t5xxl_fp16.safetensors', '--t5xxl_dtype=fp16']' returned non-zero exit status 1.

The text was updated successfully, but these errors were encountered:

hieusttruyen · 2024-07-11T16:57:22Z

@kohya-ss help me. Pls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Given groups=1, weight of size [1536, 16, 2, 2], expected input[4, 4, 128, 96] to have 16 channels, but got 4 channels instead #1419

Given groups=1, weight of size [1536, 16, 2, 2], expected input[4, 4, 128, 96] to have 16 channels, but got 4 channels instead #1419

hieusttruyen commented Jul 9, 2024

hieusttruyen commented Jul 11, 2024

Given groups=1, weight of size [1536, 16, 2, 2], expected input[4, 4, 128, 96] to have 16 channels, but got 4 channels instead #1419

Given groups=1, weight of size [1536, 16, 2, 2], expected input[4, 4, 128, 96] to have 16 channels, but got 4 channels instead #1419

Comments

hieusttruyen commented Jul 9, 2024

hieusttruyen commented Jul 11, 2024