Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference with custom inpainting model triggers: RuntimeError: mat1 and mat2 must have the same dtype #5399

Closed
alexisrolland opened this issue Oct 15, 2023 · 13 comments
Labels
bug Something isn't working stale Issues that haven't received updates

Comments

@alexisrolland
Copy link
Contributor

alexisrolland commented Oct 15, 2023

Describe the bug

I have converted DreamShaper v8 Inpainting to diffusers format using OneTrainer and everything went smoothly, but when I call StableDiffusionInpaintPipeline I get the error message: RuntimeError: mat1 and mat2 must have the same dtype

Reproduction

import os
import torch
from compel import Compel
from diffusers import StableDiffusionInpaintPipeline, UNet2DConditionModel, UniPCMultistepScheduler
from diffusers.utils import load_image

# Generation settings
prompts = ["Face of a yellow cat, high resolution, sitting on a park bench"]
negative_prompts = [""]
seed = 1134232903
generator = torch.Generator(device='cuda').manual_seed(seed)
image = load_image("https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png")
mask_image = load_image("https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png")

# Init pipeline
MODEL_PATH_INPAINTING = os.getenv('MODEL_PATH_INPAINTING')
unet = UNet2DConditionModel.from_pretrained(
    MODEL_PATH_INPAINTING,
    subfolder="unet",
    in_channels=9,
    low_cpu_mem_usage=False,
    ignore_mismatched_sizes=True
)
default_pipeline = StableDiffusionInpaintPipeline.from_pretrained(
    MODEL_PATH_INPAINTING,
    unet=unet,
    torch_dtype=torch.float16
)
default_pipeline.enable_model_cpu_offload()
default_pipeline.scheduler = UniPCMultistepScheduler.from_config(default_pipeline.scheduler.config)

# Convert prompts to embeddings
compel = Compel(
    tokenizer=default_pipeline.tokenizer,
    text_encoder=default_pipeline.text_encoder,
    truncate_long_prompts=False
)
prompt_embeds = compel(prompts)
negative_prompt_embeds = compel(negative_prompts)

# Run inference
image = default_pipeline(
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_prompt_embeds,
    image=image,
    mask_image=mask_image,
    strength=0.6,
    num_inference_steps=20,
    num_images_per_prompt=1,
    generator=generator
).images[0]

Logs

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py", line 985, in __call__
    noise_pred = self.unet(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 841, in forward
    emb = self.time_embedding(t_emb, timestep_cond)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 192, in forward
    sample = self.linear_1(sample)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype

Note I also tried without reloading the UNet2DConditionModel and instantiating StableDiffusionInpaintPipeline triggers the error: ValueError: Cannot load /models/lykon/dreamshaper-v8-inpainting/unet because conv_in.weight expected shape tensor(..., device='meta', size=(320, 4, 3, 3)), but got torch.Size([320, 9, 3, 3]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: #1619 (comment) as an example.

Reloading the unet as provided in the code above solved it.

System Info

  • diffusers version: 0.21.4
  • Platform: Linux-5.4.72-microsoft-standard-WSL2-x86_64-with-glibc2.36
  • Python version: 3.10.13
  • PyTorch version (GPU?): 2.0.1+cu117 (True)
  • Huggingface_hub version: 0.16.4
  • Transformers version: 4.34.0
  • Accelerate version: 0.23.0
  • xFormers version: 0.0.22
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@yiyixuxu @DN6 @patrickvonplaten @sayakpaul @patrickvonplaten

@alexisrolland alexisrolland added the bug Something isn't working label Oct 15, 2023
@sayakpaul
Copy link
Member

Could you help us reproduce the issue without compel?

@alexisrolland
Copy link
Contributor Author

alexisrolland commented Oct 16, 2023

Sure thing @sayakpaul I can confirm I reproduce the issue without compel. Here is the code.

Reproduction

import os
import torch
from diffusers import StableDiffusionInpaintPipeline, UNet2DConditionModel, UniPCMultistepScheduler
from diffusers.utils import load_image

# Generation settings
prompts = ["Face of a yellow cat, high resolution, sitting on a park bench"]
negative_prompts = [""]
seed = 1134232903
generator = torch.Generator(device='cuda').manual_seed(seed)
image = load_image("https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png")
mask_image = load_image("https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png")

# Init pipeline
MODEL_PATH_INPAINTING = os.getenv('MODEL_PATH_INPAINTING')
unet = UNet2DConditionModel.from_pretrained(
    MODEL_PATH_INPAINTING,
    subfolder="unet",
    in_channels=9,
    low_cpu_mem_usage=False,
    ignore_mismatched_sizes=True
)
default_pipeline = StableDiffusionInpaintPipeline.from_pretrained(
    MODEL_PATH_INPAINTING,
    unet=unet,
    torch_dtype=torch.float16
)
default_pipeline.enable_model_cpu_offload()
default_pipeline.scheduler = UniPCMultistepScheduler.from_config(default_pipeline.scheduler.config)

# Run inference
image = default_pipeline(
    prompt=prompts,
    negative_prompt=negative_prompts,
    image=image,
    mask_image=mask_image,
    strength=0.6,
    num_inference_steps=20,
    num_images_per_prompt=1,
    generator=generator
).images[0]

Log

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py", line 985, in __call__
    noise_pred = self.unet(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 841, in forward
    emb = self.time_embedding(t_emb, timestep_cond)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 192, in forward
    sample = self.linear_1(sample)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype

I'm wondering if this could be related to the OneTrainer conversion script doing something fishy... I thought about converting using https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py but I'm really not sure what to provide in all the arguments...

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Nov 16, 2023
@alexisrolland
Copy link
Contributor Author

Bump

@patrickvonplaten
Copy link
Contributor

@alexisrolland,

We can't reproduce your code snippet above because we don't know which inpainting model you use. When executing your script above one gets:

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0/0 [00:00<?, ?it/s]
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
    269     try:
--> 270         response.raise_for_status()
    271     except HTTPError as e:

12 frames
HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/None/resolve/main/unet/config.json

The above exception was the direct cause of the following exception:

RepositoryNotFoundError                   Traceback (most recent call last)
RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-655b4363-31d71c9d7d049f654af8e9b7;36301ed1-7f48-4c88-a1bc-e79f054d8000)

Repository Not Found for url: https://huggingface.co/None/resolve/main/unet/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/diffusers/configuration_utils.py](https://localhost:8080/#) in load_config(cls, pretrained_model_name_or_path, return_unused_kwargs, return_commit_hash, **kwargs)
    382                 )
    383             except RepositoryNotFoundError:
--> 384                 raise EnvironmentError(
    385                     f"{pretrained_model_name_or_path} is not a local folder and is not a valid model identifier"
    386                     " listed on '[https://huggingface.co/models'\nIf](https://huggingface.co/models'/nIf) this is a private repository, make sure to pass a"

OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login`.

@alexisrolland
Copy link
Contributor Author

I mentioned in the first post I was using DreamShaper Inpainting model that was converted to Diffusers format using OneTrainer. The last code snippet wouldn't work out of the box since it is fetching the model from a local path contained in an environment variable...

I can try to host a copy of the model on Hugging Face to help creating a reproducible piece of code. But I'm actually not sure whether the issue is with Diffuser or with the converted model itself.

How would you suggest to troubleshoot to narrow down the issue?

@superkido511
Copy link

superkido511 commented Nov 21, 2023

Same issue when running inference with my custom trained model on text_to_image task.
RuntimeError: mat1 and mat2 must have the same dtype, but got Half and Float
Here are my configs:
export MODEL_NAME="CompVis/stable-diffusion-v1-4" export NOM_EMA_REVISION="non-ema" export TRAIN_DIR="***" accelerate launch --mixed_precision="fp16" train_text_to_image.py --pretrained_model_name_or_path=$MODEL_NAME --train_data_dir=$TRAIN_DIR --use_ema --resolution=512 --logging_dir 'logs' --random_flip --train_batch_size=8 --gradient_accumulation_steps=4 --gradient_checkpointing --learning_rate=1e-04 --max_grad_norm=1 --lr_scheduler="constant" --lr_warmup_steps=0 --output_dir="sd14_en_tags_nltk" --snr_gamma 5 --max_train_steps=10000 --checkpointing_steps 500 --validation_epochs 1
And inference script:

from diffusers import StableDiffusionPipeline, UNet2DConditionModel
import torch

LANG = 'en'
model_path = "sd14_en_tags_nltk/checkpoint-10000"
unet = UNet2DConditionModel.from_pretrained(model_path + "/unet")

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    unet=unet, 
    torch_dtype=torch.float16,
    safety_checker=None,
)
pipe.to("cuda")

text = 'luxury, mat, waterproof"
images = pipe(
        prompt=text, 
        # output_type='np',
        height=512,
        width=1024,
        num_inference_steps=50,
        # torch_dtype=torch.float16,       
    ).images

@superkido511
Copy link

Nvm, I fixed it. Just remove torch_dtype=torch.float16, when creating the pipeline

@kadirnar
Copy link
Contributor

kadirnar commented Dec 12, 2023

@superkido511 , @patrickvonplaten , @alexisrolland
I found the error. It should not take prompt input. It should only be prompt embedded.

Example Doc:

https://huggingface.co/docs/diffusers/using-diffusers/weighted_prompts

I used Compel library for prompt embedding and the results are not good :/
damian0815/compel#45

@alexisrolland
Copy link
Contributor Author

@kadirnar my original example was already using prompt embeddings and I still got the error. If you got something working, would you mind sharing your code snippet please so that we can see the difference? Thanks

@kadirnar
Copy link
Contributor

@kadirnar my original example was already using prompt embeddings and I still got the error. If you got something working, would you mind sharing your code snippet please so that we can see the difference? Thanks

Sorry. I thought it was a bug with the Combel library. I think you are getting this error because you are using a custom model. I'm using the DreamShaper-Turbo model and it's nice. It would be great to test the Inpaint model as well. How did you convert?

@kadirnar
Copy link
Contributor

kadirnar commented Dec 15, 2023

@kadirnar my original example was already using prompt embeddings and I still got the error. If you got something working, would you mind sharing your code snippet please so that we can see the difference? Thanks

I tested this code and got the same error. The reason for the error is that the .mode values of the image and mask variables are not RGB. It worked after converting.

https://huggingface.co/Lykon/dreamshaper-8-inpainting

Copy link

github-actions bot commented Jan 9, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Issues that haven't received updates
Projects
None yet
Development

No branches or pull requests

5 participants