Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Press any key../Hard Fault - Unable to generate SDXL #16186

Open
4 of 6 tasks
BlackWyvern opened this issue Jul 10, 2024 · 15 comments
Open
4 of 6 tasks

[Bug]: Press any key../Hard Fault - Unable to generate SDXL #16186

BlackWyvern opened this issue Jul 10, 2024 · 15 comments
Labels
bug-report Report of a bug, yet to be confirmed

Comments

@BlackWyvern
Copy link

Checklist

  • The issue exists after disabling all extensions
  • The issue exists on a clean installation of webui
  • The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • The issue exists in the current version of the webui
  • The issue has not been reported before recently
  • The issue has been reported before but has not been fixed yet

What happened?

Whilst trying to generate any image via any model in SDXL I am met with with either a "Press any key to continue..." error. Or a hard memory access fault for python.exe.

It'll also take out several other background applications when it crashes like this.

A post in this thread suggested checking event viewer on these crashes.

Faulting application name: python.exe, version: 3.10.11150.1013, time stamp: 0x642cc427 Faulting module name: c10.dll, version: 0.0.0.0, time stamp: 0x6578c6fe Exception code: 0xc0000005 Fault offset: 0x0000000000055474 Faulting process id: 0x39bc Faulting application start time: 0x01dad2c9928bde49 Faulting application path: I:\Python\Python3-10-6\python.exe Faulting module path: I:\Stable Diffusion\venv\lib\site-packages\torch\lib\c10.dll Report Id: fac5843a-d536-4688-b8ca-2ce2e46d2d27 Faulting package full name: Faulting package-relative application ID:

Steps to reproduce the problem

Load any SDXL model.
Hit generate. Doesn't even need a prompt.

What should have happened?

Make images.

Shouldn't nuke Discord, Steam, and DWM.exe all at once.

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

sysinfo-2024-07-10-13-24.json

Console logs

venv "I:\Stable Diffusion\venv\Scripts\Python.exe"
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.9.4
Commit hash: feee37d75f1b168768014e4634dcb156ee649c05
Launching Web UI with arguments: --xformers --medvram --medvram-sdxl --autolaunch
*** "Disable all extensions" option was set, will not load any extensions ***
Loading weights [821aa5537f] from I:\Stable Diffusion\models\Stable-diffusion\SDXL\autismmixSDXL_autismmixPony.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Creating model from config: I:\Stable Diffusion\repositories\generative-models\configs\inference\sd_xl_base.yaml
I:\Stable Diffusion\venv\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Startup time: 14.8s (prepare environment: 3.0s, import torch: 4.6s, import gradio: 1.1s, setup paths: 1.3s, initialize shared: 2.2s, other imports: 0.8s, load scripts: 0.9s, create ui: 0.4s, gradio launch: 0.6s).
Loading VAE weights specified in settings: I:\Stable Diffusion\models\VAE\sdxl_vae.safetensors
Applying attention optimization: xformers... done.
Model loaded in 10.3s (load weights from disk: 0.7s, create model: 0.7s, apply weights to model: 6.8s, load VAE: 0.1s, calculate empty prompt: 1.8s).
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:08<00:00,  4.77it/s]
Press any key to continue . . . ███████████████████████████████████████████████████████████████████| 40/40 [00:06<00:00,  5.69it/s]

Additional information

No response

@BlackWyvern BlackWyvern added the bug-report Report of a bug, yet to be confirmed label Jul 10, 2024
@Allwhey
Copy link

Allwhey commented Jul 11, 2024

Hello, I made an account to speak about this issue. It is very frustrating, but I've at least found out a bit about it.
It isn't related to xformers, as disabling and enabling it makes no difference. It happens on CUDA 11.8, CUDA 12.1 and CUDA 12.1.1, and Pytorch 2.1.2 and 2.3.1.

It's a 0xc0000005 error, which is related to seg faults in Windows and is caused by lib\site-packages\torch\lib\c10.dll.

Rolling back to torch 2.0.1+cu118 currently solves the issue, at least for me. This should imply that some change has been made to torch since then, perhaps in the c10 library that causes a segfault. Unfortunately I don't have the familiarity with pytorch or ai programming to suggest what change this actually is.

If someone has any insight what causes this sort of crash in the generation process and create a tangible issue on the Pytorch Github, I would be grateful if you did so.

@Allwhey
Copy link

Allwhey commented Jul 11, 2024

In case anyone doesn't know the actual commands to do so, I just ran this in the root dir of my webui install.

call venv/Scripts/activate.bat
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
# (if you want to rollback xformers)
pip install --pre -U xformers torch==2.0.1

@BlackWyvern
Copy link
Author

Followed the above commands. Not sure how to roll back cuda versions though.

Faulting application name: bad_module_info, version: 0.0.0.0, time stamp: 0x00000000 Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000 Exception code: 0xc0000005 Fault offset: 0x00007ffd09145474 Faulting process id: 0x3c34 Faulting application start time: 0x01dad37ae8a95888 Faulting application path: bad_module_info Faulting module path: unknown Report Id: a2da468a-7006-4c8c-9a92-6d02d8513f29 Faulting package full name: Faulting package-relative application ID:

Application: mmc.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: exception code e0434352, exception address 00007FFD6E25BA99 Stack:

Faulting application name: dwm.exe, version: 10.0.19041.4355, time stamp: 0x6564cf4e Faulting module name: KERNELBASE.dll, version: 10.0.19041.4522, time stamp: 0xf7a99bd4 Exception code: 0xc00001ad Fault offset: 0x000000000012d332 Faulting process id: 0x1090 Faulting application start time: 0x01dad2cabb96cec4 Faulting application path: C:\Windows\system32\dwm.exe Faulting module path: C:\Windows\System32\KERNELBASE.dll Report Id: 36bd6217-4ba4-411f-8e2e-fce036a69f15 Faulting package full name: Faulting package-relative application ID:

@Allwhey
Copy link

Allwhey commented Jul 11, 2024

I haven't seen that one before.
You don't necessarily need to roll back cuda, you just need to make sure Cuda 11.8 is installed.
*In other words, at least for me, it works now despite the fact Cuda 11.8 and 12.1.1 are installed simultaneously.
CUDA 11.8 Download Archive
Restart your pc after to be sure, but as long as CUDA 11.8 is on your system it should work.

--index-url https://download.pytorch.org/whl/cu118 ensures that pip installs the version of pytorch 2.0.1 that was built with cuda 11.8.

If it didn't work try using --force-reinstall with the first pip command.

@AlexKaleda
Copy link

Thank you for this solution, it works well!
Interestingly, it began when I switched my graphics card from 1070 to 3060. I could generate 1-2 times and then crash in the end of the next run.
At the same time I used ComfyUI with '2.3.0+cu121' and had no problems.

@Allwhey
Copy link

Allwhey commented Jul 11, 2024

Thank you for this solution, it works well! Interestingly, it began when I switched my graphics card from 1070 to 3060. I could generate 1-2 times and then crash in the end of the next run. At the same time I used ComfyUI with '2.3.0+cu121' and had no problems.

Thank you for mentioning that 2.3.0 still works on ComfyUI! In that case, it's still possible to be related to sd-webui's image generation implementation, perhaps specifically for SDXL. I am curious why changing the torch version would cause a segfault, because every error should have a sanitized exception. Nonetheless there might be a workaround on sd-webui's end that can be done to mitigate this.

@BlackWyvern
Copy link
Author

Installed the cuda kit as instructed. Still getting faults, but at least it's showing c10.dll again? Also managed to get it to hard fault, once necessitating a full restart.

3asKl6mPhH

The error still and always occurs at 100% completion before the image is decoded/saved/displayed or whatever happens there. If that helps narrow anything down.

Faulting application name: python.exe, version: 3.10.11150.1013, time stamp: 0x642cc427 Faulting module name: c10.dll, version: 0.0.0.0, time stamp: 0x6578c6fe Exception code: 0xc0000005 Fault offset: 0x0000000000055474 Faulting process id: 0x3724 Faulting application start time: 0x01dad3e9bcdcb909 Faulting application path: I:\Python\Python3-10-6\python.exe Faulting module path: I:\Stable Diffusion\venv\lib\site-packages\torch\lib\c10.dll Report Id: f392d87e-8bd7-4242-a5b8-d73f082cfa9c Faulting package full name: Faulting package-relative application ID:

@Allwhey
Copy link

Allwhey commented Jul 13, 2024

#15175
I found an issue that seems to be heavily or directly related to this.
It seems it's definitely some combination of an issue with how sd-webui is handling SDXL and SDXL loras, along with more recent version of pytorch.
But there should definitely be a problem with the webui code itself, as SDXL can barely work on my computer with 32GB of ram running at 6200mhz and 32gb of page files. It might crash in part due to this memory mismanagement.

@Allwhey
Copy link

Allwhey commented Jul 13, 2024

Installed the cuda kit as instructed. Still getting faults, but at least it's showing c10.dll again? Also managed to get it to hard fault, once necessitating a full restart.

3asKl6mPhH

The error still and always occurs at 100% completion before the image is decoded/saved/displayed or whatever happens there. If that helps narrow anything down.

Faulting application name: python.exe, version: 3.10.11150.1013, time stamp: 0x642cc427 Faulting module name: c10.dll, version: 0.0.0.0, time stamp: 0x6578c6fe Exception code: 0xc0000005 Fault offset: 0x0000000000055474 Faulting process id: 0x3724 Faulting application start time: 0x01dad3e9bcdcb909 Faulting application path: I:\Python\Python3-10-6\python.exe Faulting module path: I:\Stable Diffusion\venv\lib\site-packages\torch\lib\c10.dll Report Id: f392d87e-8bd7-4242-a5b8-d73f082cfa9c Faulting package full name: Faulting package-relative application ID:

https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion

You can also try disabling system memory fallback for later nvidia drivers which seems to have some extreme disagreements with pytorch and stable diffusion. Not sure if RAM can handle anything without it though.

@BlackWyvern
Copy link
Author

Did a clean driver reinstall. And disabled memory fallback.
I managed to get it to do one generation properly.

Then it segfaulted on c10.dll again.

@fennecbutt
Copy link

I was getting this error with all variants of this model (Ratatoskr) https://civitai.com/models/192854/ratatoskr-animal-creature-and-furry.

Same exception code/location and faulting module. After mucking about with other fixes people suggested, it has been completely solved when I realised that I had a swap of 0MB set (because I'm using ssds and have 32gb of ram). I set a min swap of 8192mb and max of 32768mb and it's now working just fine. Haven't tried with lower values yet.

@K-Max-Me
Copy link

K-Max-Me commented Aug 1, 2024

This could be infact an Nvidia issue, but oddly enough this doesn't happen in ComfyUI. I did notice it happening when VRAM memory is near or maxed out (Even on a 3090)

I'd thought I post this up in case someone has time to try it on a clean install. I did downgrade pyTorch to 2.0.1 and use cu118 as @Allwhey mentioned, but still crashed, didn't work for me.

I seem to get a different fault from the start from everyone else, though. Check your windows event log in case you get that random "press key to continue..." error with no error.

Exception 0xc0000005 is an Access Violation Exception.

Faulting application name: python.exe, version: 3.10.10150.1013, time stamp: 0x63e2893e Faulting module name: nvcuda64.dll, version: 32.0.15.5599, time stamp: 0x665baccb Exception code: 0xc0000005 Fault offset: 0x00000000003e862f Faulting process id: 0x0xEBF0 Faulting application start time: 0x0x1DAE3D33D3B2742 Faulting application path: C:\Users\kmax\AppData\Local\Programs\Python\Python310\python.exe Faulting module path: C:\WINDOWS\system32\DriverStore\FileRepository\nv_dispsig.inf_amd64_e6cac7f31a92d62e\nvcuda64.dll Report Id: 934672f3-76b6-4ef7-abc9-6803a77fd56e Faulting package full name: Faulting package-relative application ID:

@cbodenberger
Copy link

cbodenberger commented Aug 12, 2024

I have been having this issue with multiple gpu rig 3090, 2080ti, 3060, p102-100. Windows 11. 32GB of Ram. I've got 5 GPUs + 2 CPUs (x99 motherboard, dual Xeon e5-2630 v4s 20 core 40 thread 2.2GHz). I can get a pretty consistent crash. I'm using a discord bot and two instances on my 2080tis in --api --nowebui. When I queue up multiple generations it crashes. When I queue one image at a time and give it some time between queues it doesn't crash. Python, DWM, and Nvidia driver all show errors around the time of crash. I've played with --lowram, --medvram. I never seem to run out of vram or ram. Using about half of vram. Maybe 29 of 32 GB of ram utilization when I queue a ton up.

I run a third instance on my 3090 in webui and it crashes, just not as often.

I don't know if my rambling helps narrow it down at all. I can provide logs and junk and a more detailed systeminfo if needed.

edit: probably should have mentioned it happens when I hit 100% on the current generation and says "Press any key to continue" unless it crashes the DWM, in which case I can't see.

@kitsumed
Copy link

I am also experiencing this issue with SDXL models, there's no fixed amount of generation before the issue, sometimes 40, other time 6, but for me, it always happen at the end of the generation, when the ETA show 100%

Info / Specs

OS: Windows 11 Pro 23H2 ver22631.4037
RAM: 32GB
GPU: NVIDIA GeForce RTX 4070 SUPER
CPU: Intel Core i7-14700K
NVIDIA driver version (according to NVIDIA app): 560.81 STUDIO (latest)
NVIDIA driver version (according to dxdiag): 32.0.15.6081
I had this issue before and after upgrading my NVIDIA driver. I didn't note the old version number.

Webui config

Run on python: 3.10.11
Webui version: 1.10.1 (commit 82a973c)
Python: --HIDDEN--\AppData\Local\Programs\Python\Python310\python.exe
Arguments: --xformers --opt-split-attention --api

The python process always crashes the same way as OP, and sometimes take dwmi down with him, windows managed to "recover" from the fault by itself two times (with a lot of UI issues on the whole OS after), the other times a reboot was the only way.

Here's the events viewer logs for Dwminit

All of the warnings involved 3 different exit codes, so I will only mention the different exit codes.

Le processus Gestionnaire de fenêtrage a été quitté. (Code de sortie du processus : 0xc00001ad, nombre de redémarrages : 3, ID de périphérique d’affichage principal : NVIDIA GeForce RTX 4070 SUPER)
(Code de sortie du processus : 0x8007001f, nombre de redémarrages : 4)
(Code de sortie du processus : 0xc0000409, nombre de redémarrages : 2)

During the generation, additionals ressources exhaustions warning about virtual memory also got thrown until the python process crashed. *No out of memory error ever showed up in the webui console.

Windows a diagnostiqué avec succès une insuffisance en mémoire virtuelle. La plus grande consommation de mémoire virtuelle s’est répartie entre les programmes suivants : python.exe (6260) avec 30858067968 octets, firefox.exe (15176) avec 748253184 octets et firefox.exe (6672) avec 609722368 octets.

The errors that followed the last warning:

Nom de l’application défaillante : python.exe, version : 0.0.0.0, horodatage : 0x00000000
Nom du module défaillant : unknown, version : 0.0.0.0, horodatage : 0x00000000
Code d’exception : 0xc0000005
Décalage du défaut : 0x00007ff9281c5474
ID processus défaillant : 0x0x1874
Le dépassement de délai (30000 millisecondes) a été atteint lors de l'attente de la connexion du service Service hôte WDIServiceHost.
Le service Service hôte WDIServiceHost n'a pas pu démarrer en raison de l'erreur : 
Le service n’a pas répondu assez vite à la demande de lancement ou de contrôle.

This crash did not cause dwmi to crash, and I was able to restart the webui right after seeing "Press any key to continue..." in the console, no error displayed. Also note that the python crash from an unknown module sometimes has the following error before itself, but again, only python crash.

Nom de l’application défaillante : python.exe, version : 3.10.11150.1013, horodatage : 0x642cc427
Nom du module défaillant : c10.dll, version : 0.0.0.0, horodatage : 0x6578c6fe
Code d’exception : 0xc0000005
Décalage du défaut : 0x0000000000055474
ID processus défaillant : 0x0x2C0C
Heure de démarrage de l’application défaillante : 0x0x1DAF4EC9C212873
Chemin de l’application défaillante : --HIDDEN--\AppData\Local\Programs\Python\Python310\python.exe
Chemin du module défaillant : --HIDDEN--\venv\lib\site-packages\torch\lib\c10.dll

Here are the errors combination that cause dwmi to crash :

Windows a diagnostiqué avec succès une insuffisance en mémoire virtuelle. La plus grande consommation de mémoire virtuelle s’est répartie entre les programmes suivants : python.exe (19868) avec 27310485504 octets, explorer.exe (7664) avec 935579648 octets et firefox.exe (11976) avec 803282944 octets.
Nom de l’application défaillante : python.exe, version : 3.10.11150.1013, horodatage : 0x642cc427
Nom du module défaillant : c10.dll, version : 0.0.0.0, horodatage : 0x6578c6fe
Code d’exception : 0xc0000005
Décalage du défaut : 0x0000000000055474
ID processus défaillant : 0x0x4D9C
Heure de démarrage de l’application défaillante : 0x0x1DAF50E92108AE9
Chemin de l’application défaillante : --HIDDEN--\AppData\Local\Programs\Python\Python310\python.exe
Chemin du module défaillant : --HIDDEN--\venv\lib\site-packages\torch\lib\c10.dll
Windows a diagnostiqué avec succès une insuffisance en mémoire virtuelle. La plus grande consommation de mémoire virtuelle s’est répartie entre les programmes suivants : python.exe (19764) avec 27600834560 octets, explorer.exe (7664) avec 942923776 octets et firefox.exe (11976) avec 763150336 octets.
Le processus Gestionnaire de fenêtrage a été quitté. (Code de sortie du processus : 0xc00001ad, nombre de redémarrages : 1, ID de périphérique d’affichage principal : NVIDIA GeForce RTX 4070 SUPER)

dwmi crash here

Nom de l’application défaillante : dwm.exe, version : 10.0.22621.3672, horodatage : 0x85e98d33
Nom du module défaillant : nvwgf2umx.dll, version : 32.0.15.6070, horodatage : 0x668ec4ab
Code d’exception : 0xc0000409
Décalage du défaut : 0x0000000000f0ec71
ID processus défaillant : 0x0x8CC
Heure de démarrage de l’application défaillante : 0x0x1DAF51AFE81AE7C
Chemin de l’application défaillante : C:\Windows\system32\dwm.exe
Chemin du module défaillant : C:\Windows\System32\DriverStore\FileRepository\nv_dispsi.inf_amd64_7b817a36f48f161e\nvwgf2umx.dll
Nom de l’application défaillante : dwm.exe, version : 10.0.22621.3672, horodatage : 0x85e98d33
Nom du module défaillant : dwmcore.dll, version : 10.0.22621.4036, horodatage : 0xa44fde33
Code d’exception : 0xc00001ad
Décalage du défaut : 0x0000000000278c68
ID processus défaillant : 0x0x4AA4
Heure de démarrage de l’application défaillante : 0x0x1DAF51AFF5BEA55
Chemin de l’application défaillante : C:\Windows\system32\dwm.exe
Chemin du module défaillant : C:\Windows\system32\dwmcore.dll

@elen07zz
Copy link

same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-report Report of a bug, yet to be confirmed
Projects
None yet
Development

No branches or pull requests

8 participants