Attempt to resolve NaN issue with unstable VAEs while utilizing full precision (`--no-half-vae`) #12624

catboxanon · 2023-08-17T18:35:34Z

Update: I believe #12630 fixes this properly -- I will close this PR when that one or another is merged to resolve this.

Description

Attempts to solve a regression in cc53db6 (the previous commit a64fbe8 does not have this issue). I think this is also related to #12611. PR #12599 also still has this issue.

To preface: this only ever seems to happen with animevae.pt, and only for certain prompts. As such, it's difficult to find an easily reproducible scenario. This one is consistent for me, and I've verified it also works on somebody else's system. Also, this is absolutely not the correct way to fix this, because now it wastes time trying to decode the latent potentially twice, but I'm trying to wrap my head around what's going wrong here, and hopefully opening this PR brings that to discussion.

How to reproduce

Checkout cc53db6 or later, launch with --no-half-vae
Get this VAE, model, and these LoRAs
VAE: https://huggingface.co/a1079602570/animefull-final-pruned/blob/main/animevae.pt
Model: https://huggingface.co/AnonymousM/Based-mixes/blob/main/Based64mix-V3.safetensors
LoRAs:
~~(removed)~~
Download and use the metadata from this image to set up the params
(optionally) verify the image can be generated without hires fix.
Attempt to generate the image with hires fix enabled. These are the settings I usually use but I tested this several other times and the only factor that seems to matter is the Upscale by value must be 1.15 or more.
(optionally) take the above image, and use the exact same parameters to upscale in the img2img tab. This will not produce the NaNs exception.

Side note: I got into the weeds and did some debugging, and this is why I'm also suspicious if this is related to the issue I linked above. This is the part of the code that produces the NaNs: https://github.com/Stability-AI/stablediffusion/blob/cf1d67a6fd5ea1aa600c4df58e5b47da45f6bdbf/ldm/modules/diffusionmodules/model.py#L634-L641

If I attempt to upscale the image in img2img, and use the initial value of z (the upscaled latent, before this line is executed), store that latent, and then attempt to use hires fix in txt2img, but with that value of z I stored earlier, it still produces NaNs in that function. However, if I do it the other way around, storing the upscaled latent from txt2img, and using that in img2img, I instead get the below:

RuntimeError: Input type (struct c10::Half) and bias type (float) should be the same

Just noting my specs here as well, incase this is somehow some pytorch bug.

python: 3.10.6  •  torch: 2.0.1+cu118  •  xformers: 0.0.20

Checklist:

I have read contributing wiki page
I have performed a self-review of my own code
My code follows the style guidelines
My code passes tests

catboxanon · 2023-08-19T05:44:33Z

PR that supercedes this has been merged (#12630) -- closing

Attempt to resolve NaN issue with unstable VAEs

e7c6b3c

catboxanon requested a review from AUTOMATIC1111 as a code owner August 17, 2023 18:35

catboxanon marked this pull request as draft August 17, 2023 19:04

catboxanon marked this pull request as ready for review August 17, 2023 19:56

catboxanon changed the title ~~Attempt to resolve NaN issue with unstable VAEs in full precision~~ Attempt to resolve NaN issue with unstable VAEs while utilizing full precision (--no-half-vae) Aug 17, 2023

catboxanon mentioned this pull request Aug 17, 2023

Attempt to resolve NaN issue with unstable VAEs in fp32 mk2 #12630

Merged

4 tasks

catboxanon marked this pull request as draft August 17, 2023 22:13

catboxanon closed this Aug 19, 2023

catboxanon deleted the fix/nans-mk1 branch March 4, 2024 23:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to resolve NaN issue with unstable VAEs while utilizing full precision (`--no-half-vae`) #12624

Attempt to resolve NaN issue with unstable VAEs while utilizing full precision (`--no-half-vae`) #12624

catboxanon commented Aug 17, 2023 •

edited

Loading

catboxanon commented Aug 19, 2023

Attempt to resolve NaN issue with unstable VAEs while utilizing full precision (--no-half-vae) #12624

Attempt to resolve NaN issue with unstable VAEs while utilizing full precision (--no-half-vae) #12624

Conversation

catboxanon commented Aug 17, 2023 • edited Loading

Description

Checklist:

catboxanon commented Aug 19, 2023

Attempt to resolve NaN issue with unstable VAEs while utilizing full precision (`--no-half-vae`) #12624

Attempt to resolve NaN issue with unstable VAEs while utilizing full precision (`--no-half-vae`) #12624

catboxanon commented Aug 17, 2023 •

edited

Loading