Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Given groups=1, weight of size [1536, 16, 2, 2], expected input[4, 4, 128, 96] to have 16 channels, but got 4 channels instead #1419

Open
hieusttruyen opened this issue Jul 9, 2024 · 1 comment

Comments

@hieusttruyen
Copy link

Loading settings from /content/fine_tune/config/config_file.toml...
/content/fine_tune/config/config_file
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in huggingface/transformers#24565
Training with captions.
loading existing metadata: /content/fine_tune/meta_lat.json
using bucket info in metadata / メタデータ内のbucket情報を使います
[Dataset 0]
batch_size: 4
resolution: (1024, 1024)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: None
max_bucket_reso: None
bucket_reso_steps: None
bucket_no_upscale: None

[Subset 0 of Dataset 0]
image_dir: "/content/fine_tune/train_data"
image_count: 30
num_repeats: 20
shuffle_caption: False
keep_tokens: 0
keep_tokens_separator:
caption_separator: ,
secondary_separator: None
enable_wildcard: False
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
alpha_mask: False,
metadata_file: /content/fine_tune/meta_lat.json

[Dataset 0]
loading image sizes.
100% 30/30 [00:00<00:00, 691368.79it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 1024), count: 60
bucket 1: resolution (576, 1024), count: 280
bucket 2: resolution (704, 1024), count: 40
bucket 3: resolution (768, 1024), count: 100
bucket 4: resolution (832, 1024), count: 40
bucket 5: resolution (1024, 704), count: 20
bucket 6: resolution (1024, 768), count: 20
bucket 7: resolution (1024, 1024), count: 40
mean ar error (without repeats): 0.0
prepare accelerator
accelerator device: cuda
Loading SD3 models from /content/pretrained_model/sd3_medium.safetensors
loading model for process 0/1
Building VAE
Loading state dict...
Loaded VAE:
[Dataset 0]
caching latents.
checking cache validity...
100% 30/30 [00:00<00:00, 554313.30it/s]
caching latents...
0it [00:00, ?it/s]
loading model for process 0/1
Loading clip_l from /content/pretrained_model/clip_l.safetensors...
Building ClipL
Loading state dict...
Loaded ClipL:
loading model for process 0/1
Loading clip_g from /content/pretrained_model/clip_g.safetensors...
Building ClipG
Loading state dict...
Loaded ClipG:
loading model for process 0/1
Loading t5xxl from /content/pretrained_model/t5xxl_fp16.safetensors...
Building T5XXL
Loading state dict...
Loaded T5XXL:
[Dataset 0]
caching text encoder outputs.
checking cache existence...
100% 30/30 [00:00<00:00, 134146.18it/s]
caching text encoder outputs...
0it [00:00, ?it/s]
loading model for process 0/1
Building MMDit
Loading state dict...
Loaded MMDiT:
train mmdit: True
number of models: 1
number of trainable parameters: 2028328000
prepare optimizer, data loader etc.
use Adafactor optimizer | {'scale_parameter': False, 'relative_step': False, 'warmup_init': False}
constant_with_warmup will be good / スケジューラはconstant_with_warmupが良いかもしれません
running training / 学習開始
num examples / サンプル数: 600
num batches per epoch / 1epochのバッチ数: 150
num epochs / epoch数: 53
batch size per device / バッチサイズ: 4
gradient accumulation steps / 勾配を合計するステップ数 = 4
total optimization steps / 学習ステップ数: 2014
steps: 0% 0/2014 [00:00<?, ?it/s]
epoch 1/53
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
epoch is incremented. current_epoch: 0, epoch: 1
Traceback (most recent call last):
File "/content/kohya-trainer/sd3_train.py", line 974, in
train(args)
File "/content/kohya-trainer/sd3_train.py", line 750, in train
model_pred = mmdit(noisy_model_input, timesteps, context=context, y=pool)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 680, in forward
return model_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 668, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/content/kohya-trainer/library/sd3_models.py", line 998, in forward
x = self.x_embedder(x) + self.cropped_pos_embed(H, W, device=x.device).to(dtype=x.dtype)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/content/kohya-trainer/library/sd3_models.py", line 298, in forward
x = self.proj(x)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [1536, 16, 2, 2], expected input[4, 4, 128, 96] to have 16 channels, but got 4 channels instead
steps: 0% 0/2014 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1017, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'sd3_train.py', '--sample_prompts=/content/fine_tune/config/sample_prompt.toml', '--config_file=/content/fine_tune/config/config_file.toml', '--clip_l=/content/pretrained_model/clip_l.safetensors', '--clip_g=/content/pretrained_model/clip_g.safetensors', '--t5xxl=/content/pretrained_model/t5xxl_fp16.safetensors', '--t5xxl_dtype=fp16']' returned non-zero exit status 1.

@hieusttruyen
Copy link
Author

@kohya-ss help me. Pls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant