[Resolved] NVIDIA driver performance issues #11063

w-e-w · 2023-06-06T16:20:51Z

Update (2023-10-31)

This issue should now be entirely resolved. NVIDIA has made a help article to disable the system memory fallback behavior. Please upgrade to the latest driver (546.01 or newer) and follow the guide on their website: https://nvidia.custhelp.com/app/answers/detail/a_id/5490

Update (2023-10-19)

issue has been reopened as it seems like more and more reports are saying that the issue is not yet fixed

Update (2023-10-17)

there seems to be some reports saying that the issue is still not fixed

comments

#11063 (comment)
#11063 (comment)

Update (2023-10-14)

This issue has reportedly been fixed by NVIDIA as of 537.58 (537.42 if using Studio release). Please update your drivers to this version or later.

The original issue description follows.

Discussed in #11062

^{Originally posted by w-e-w June 7, 2023}
some users have reported some issues related to the latest Nvidia drivers
nVidia drivers change in memory management vladmandic#1285
#11050 (comment)
if you have been experiencing generation slowdowns or getting stuck, consider downgrading to driver version 531 or below
NVIDIA Driver Downloads

This issue will be closed when NVIDIA resolves this issue. It currently has a tracking number of [4172676]

tusharbhutt · 2023-06-07T14:58:54Z

Funny, I am randomly getting the issues where an output is stuck at 50% for an hour, and I am on 531.41 for an NVIDIIA 3060 12GB model

chaewai · 2023-06-07T19:08:59Z

Strangely mine seems to go at normal speed for the first gen on a checkpoint, or if I change the clip on a checkpoint, but subsequent gens go muuuch slower. Annoyingly Diablo won't run on 531.

designborg · 2023-06-11T15:58:22Z

I can confirm this bug. I was getting results (as expected) before I installed the latest Titan RTX drivers. I will try installing a previous build.

AIDungeonTester2 · 2023-06-17T00:37:29Z

Strangely mine seems to go at normal speed for the first gen on a checkpoint, or if I change the clip on a checkpoint, but subsequent gens go muuuch slower. Annoyingly Diablo won't run on 531.

Yeah, that's exactly how it is for me. When I tried inpainting, the first gen runs through just fine, but any subsequent ones have massive hang-ups, necessitating a restart of the commandline window and rerunning webui-user.bat.

younyokel · 2023-06-19T14:17:33Z

I wasn't sure if there was a problem with the drivers so I reinstalled WebUI, but the problem didn't go away. To think, everything generates fine like before, but once the High Res Fix starts and finishes, it looks like a minute pause.
Edit: confirming. Downgraded to 531.68. Now everything as it was

hearmeneigh · 2023-06-25T20:23:22Z

If you are stuck with a newer Nvidia driver version, downgrading to Torch 1.13.1 seems to work too.

Add the following to webui-user.bat:
set TORCH_COMMAND=pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
Remove <webui-root>/venv directory
(Re)start WebUI

Shawdooow · 2023-06-25T22:21:15Z

I am having the opposite issue where on the newer drivers my first image generation is slow because of some clogged memory on my GPU which frees itself as soon as it gets to the second one.

Downgrading Torch didn't seem to help at all.
Downgrading from 536.23 to 531.79 fixes the problem instantly.

younyokel · 2023-07-03T12:28:30Z

Anyone, is this problem still relevant?

designborg · 2023-07-03T15:47:56Z

I haven't tried with the latest drivers, so I don't know if this issue is still ongoing.

PsychoGarlic · 2023-07-03T23:23:18Z

Extremely slow for me. Downgraded the pytorch, and had a whole lotta of new problems. What usually took 4h is taking 10+

invaderxan1 · 2023-07-05T07:33:03Z

Please tell me there is a fix in the pipeline?

LabunskyA · 2023-07-05T22:55:53Z

For pro graphics (at least for my A4000) 531 is not going to help with eliminating the issue. Need to downgrade to at least 529 to get rid of the shared memory usage. And both 529 / 531 / 535 / 536 in production brunch are working way worse, than 531 at new feature (uses shared VRAM, but way smaller footprint for some reason)

RobotsHoldingHands · 2023-07-08T08:59:56Z

Can confirm this is still an issue, I have a RTX 3080 TI and downgrading to 531.68 solved it for me.

Detrian · 2023-07-09T10:10:25Z

I'm using a 3070, torch: 2.0.1+cu118, and can confirm that this is still an issue with the 536.40 driver. Using highres.fix in particular makes everything break once you reach 98% progress on an image.

PsychoGarlic · 2023-07-09T10:30:44Z

It got a tiny bit better here. torch 1.13.1+cu117. 531.79. Cuda compilation tools release 12.0, V12.0.76 Still having issues with the duration of the generations. Usually, 200 frames took 4h, and now it is taking 10 (720x1280, 30 steps, 2~3 controlNets). Don't know how to fix it properly. Every other fix I did, severely damaged the quality of the images. I now know that I was using the 1.2.1 version of the webUI and the torch was not 2.0. Every other setting I do not remember. Now I have everything written somewhere hahahah Em dom., 9 de jul. de 2023 às 11:10, Detrian ***@***.***> escreveu:

…

I'm using a 3070, torch: 2.0.1+cu118, and can confirm that this is still an issue with the 536.40 driver. Using highres.fix in particular makes everything break once you reach 98% progress on an image. — Reply to this email directly, view it on GitHub <#11063 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQXED22T44Y6R7JGQJVB5LDXPJ7RZANCNFSM6AAAAAAY4U6YGE> . You are receiving this because you commented.Message ID: ***@***.***>

dajusha · 2023-07-18T23:12:40Z

536.67 fixed this? or not?

PsychoGarlic · 2023-07-19T23:09:46Z

I did not try it. A lot of wasted time already hahjaja Em qua., 19 de jul. de 2023 às 00:12, dajusha ***@***.***> escreveu:

…

536.67 fixed this? or not? — Reply to this email directly, view it on GitHub <#11063 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQXED2ZHD4ROQFAXGYNLTKLXQ4J7HANCNFSM6AAAAAAY4U6YGE> . You are receiving this because you commented.Message ID: ***@***.***>

WhiteX · 2023-07-20T12:37:38Z

536.67 fixed it for me.

prescience-data · 2023-07-24T01:08:37Z

536.67 also worked for me somewhat, meaning it still seems to drop to shared memory but not as aggressively (latest versions seem to start using shared memory at 10GB rather than fully maxing out all available 12GB which matters.

The 536.67 driver release notes still references shared memory, and I recently started getting the "hanging at 50% bug" again today after updating some plugins which prompted me to dig a bit deeper for some solutions.

I often use 2 or 3 ControlNet 1.1 models + Hi-res Fix upscaling on a 12GB card which is what triggers it if I watch my Performance tab and see the GPU begin to use shared CPU memory.

The ideal fix would be finding some way to create a --never-use-migraine-inducing-shared-memory flag, but I assume this would rely on some driver or operating system API to become available after some light research as there doesn't seem to be a way to "block" a specific process from using shared memory.

However, for the good news - I was able to massively reduce this >12GB memory usage without resorting to --medvram with the following steps:

Initial environment baseline

Check your CLI to make sure you don't have any "using old xformers" WARN message (not sure if this is actually related but it was part of the process, so makes sense to include it)
Add set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512 to webui-user.bat
I assume here, 12GB users are already running the flags --xformers and --opt-split-attention.

Biggest improvement

Assuming your environment already looks similar to the above, by far the biggest VRAM drop I found was switching from the 1.4GB unpruned .pth ControlNet 1.1 models to these 750MB pruned .safetensors versions https://civitai.com/models/38784

Hope this helps anyone in a similar frustrating position 😁

catboxanon · 2023-08-03T01:34:45Z

From my understanding ComfyUI might've done something with CUDA's malloc to fix this. comfyanonymous/ComfyUI@1679abd

Looks like a lot of cards also don't support this though: https://github.com/search?q=repo%3Acomfyanonymous%2FComfyUI+malloc&type=commits&s=author-date&o=desc

catboxanon · 2023-08-06T19:25:41Z

536.67 also did not fix this, according to the release notes.

https://us.download.nvidia.com/Windows/536.67/536.67-win11-win10-release-notes.pdf

3.2 Open Issues in Version 536.67 WHQL

This driver implements a fix for creative application stability issues seen during heavy
memory usage. We’ve observed some situations where this fix has resulted in
performance degradation when running Stable Diffusion and DaVinci Resolve. This will
be addressed in an upcoming driver release. [4172676]

david-trigo · 2023-08-07T17:33:13Z

I updated the drivers without thinking this might happen and now I can't go back. I have tried removing the drivers with "Display Driver Uninstaller" and then installing v531.68 and v528.49 , but it still doesn't go as fast as before. RTX 4080 (Laptop) 12GB. I seem to be missing something.

Edit: finally my problem seems to be with the laptop itself. Yesterday I was testing 536.67 and 536.99 on my desktop using RTX 3080 with no problems.

sandros94 · 2023-10-18T10:53:25Z

The latest NVIDIA driver 537.58 has a serious bug. After I installed and restarted my computer, my computer went black and there was no response. I can't operate a computer, for Christ's sake.

same. only the cursor was visible, i deleted the driver through safe mode

Damn NVIDIA, release the driver without even testing, want to know how to feedback this serious bug?

That's actually the difference between Game Ready Drivers and Studio Drivers.

The first only pass much simpler tests and gets pushed almost at every commit, while the Studio Drivers are almost actually tested (but if only a few tests don't pass it could get released anyway even if it still doesn't fix an issue like from 531 to 537.58).

Or at least this is what I've experienced since the Studio Driver first release

kanbol · 2023-10-19T01:13:29Z

My 4090 can use the 531 version driver to directly output 4k 3840*2160 through i2i. The video memory is almost full without OOM. If the resolution is higher, it will directly OOM.
After changing to the latest driver, the video memory of the 4k i2i is full at the beginning of the same operation, but instead of OOM, it starts to use the system memory, and the speed will be much slower.
this problem exists in all the drivers after version 531,including 545.84.

w-e-w · 2023-10-19T01:20:00Z

@catboxanon I'm going to reopen this as it seems there are more reports saying that issue is not yet fixed

wogam · 2023-10-20T17:31:29Z

545.84 tanks the performance on the 3090, down from multiple it/s to 6s per iteration.

gravitate-dev · 2023-10-20T19:31:20Z

Not yet fixed. Although this isn't an issue with a1111 I respect that it's open.

Waiting for a new driver as the latest as of Oct 20 on the new feature branch didnt fix it.

A6000, Linux, 30 steps DPM++ 2M Karras 512x768
Before: 6s
Now: 18s

A 300% SLOWER image generation

zethriller · 2023-10-22T14:44:45Z

545.84 on a 4060 Ti, no issue during generation.
--However-- Hires has a major issue, x2 on a standard 3:2 XL (832x1216) format result in extremely high memory usage (16 GB VRAM full + about 10-12 GB shared ram), put the whole system on it knees and can last up to 10 minutes. It seems it is worse than with previous drivers (537 maybe ?)
SD 1.5 has no perceptible performance issue.
Occurence is random but usually after a few images in a batch, and it always occurs after the hires steps.
Sometimes it seems it will last forever, you can't even Ctrl+C the console. You can close it with the X button but once i got a blue screen doing that, the first ever i see in Win 11.
On the drivers before it could freeze like that for a few, then shows a trace dump in the console about websocket disconnect (once per session), then continue as normal.

huntermilo1 · 2023-10-24T11:08:47Z

I have NVIDIA GTX 1650Ti 4GB VRAM.. (I know its low spec) with driver 532.03.. current generation time is lowest I have got i.e 2-3 minutes for 1024 x 1024 image. I'm skeptical to upgrade driver to the latest version. Would appreciate answers to my below queries.

Should i upgrade to latest version given that most people raising issue here have a RTX card?
Incase i face issue in image generation does rolling back the driver works? has anyone tried it?
Anybody with same or similar graphics card has upgraded to latest version? are you'll facing any issue with Automatic1111?

PS: I'm a noob!

KrisadaFantasy · 2023-10-24T13:21:33Z

I have NVIDIA GTX 1650Ti 4GB VRAM.. (I know its low spec) with driver 532.03.. current generation time is lowest I have got i.e 2-3 minutes for 1024 x 1024 image. I'm skeptical to upgrade driver to the latest version. Would appreciate answers to my below queries.

Should i upgrade to latest version given that most people raising issue here have a RTX card?

Incase i face issue in image generation does rolling back the driver works? has anyone tried it?

Anybody with same or similar graphics card has upgraded to latest version? are you'll facing any issue with Automatic1111?

PS: I'm a noob!

I am on 2060 6GB VRAM. Previously I update from 531.68 Studio Version to the 537.42 and got problem.

I am now on 537.58 Studio version. no speed problem but I might be the lucky one.
Yes, rolling back worked for me. I reverted from 537.42 with issue to 531.68 with clean install option and everything works fine. I keep it that way until 537.58 got released. If it worked better for you before, you should consider using older driver for the time being, also I remember 532 is withing the range of versionswith issue.

I also noticed increase in speed in latest driver. Maybe it is as NVDIA advertised or something I cleared out because I reset my whole PC, but my MEDVRAM speed has increased from about 1-3 s/it to 1-2 its/s. Splendid.

FilipeF12 · 2023-10-24T16:25:33Z

I have NVIDIA GTX 1650Ti 4GB VRAM.. (I know its low spec) with driver 532.03.. current generation time is lowest I have got i.e 2-3 minutes for 1024 x 1024 image. I'm skeptical to upgrade driver to the latest version. Would appreciate answers to my below queries.
1. Should i upgrade to latest version given that most people raising issue here have a RTX card?

2. Incase i face issue in image generation does rolling back the driver works? has anyone tried it?

3. Anybody with same or similar graphics card has upgraded to latest version? are you'll facing any issue with Automatic1111?
PS: I'm a noob!

NVIDIA recommends uninstalling any new driver and then re-installing the old driver - as opposed to using windows to roll back the driver. Several people mentioned using a program named DDU to completely uninstall the new driver.

https://www.guru3d.com/download/display-driver-uninstaller-download/

Create a restore point first.

DmytroSokhach · 2023-10-24T23:30:09Z

NVCleanstall sounds like good automated option to install specific version
See here: #11050 (comment)

PuckStar · 2023-10-26T18:18:12Z

Anyone know if this got fixed in the latest version 545.92 ?

EDIT: Started testing myself and so far so good! Before I could only generate a few SDXL images and then it would choke completely and generating time increased to like 20min or so. Needed to restart SD fully to get normal speed again.

With this new drivers I've generated already a dozen images and speeds stays the same!

Proryanator · 2023-10-26T20:47:02Z

I tried reverting to 528 myself from whatever the latest is now, and I'm still barely getting 1 It/s on my 3080ti (was getting 22 it/s before). I'm not sure it's the driver to be honest.

zethriller · 2023-10-29T14:09:51Z

It's getting close to impossible to re-download 531 or below now, oldest i could find are 532.
And my 4060 is not detected until the 536 something.

Started testing myself and so far so good! Before I could only generate a few SDXL images and then it would choke completely and generating time increased to like 20min or so

Oof, what did you try to do for having such a drop ?

DmytroSokhach · 2023-10-29T15:42:13Z

Guys

to get older NVIDIA drivers installed, try
a. https://www.guru3d.com/download/display-driver-uninstaller-download/
b. https://www.techpowerup.com/download/techpowerup-nvcleanstall/
c. Direct link to drivers: https://www.nvidia.com/download/driverResults.aspx/199990/en-us/
Also make sure you have optimization set to "sdp" or anything, but not "None". (for a week I was struggling with [Bug]: OutOfMemoryError: CUDA out of memory. #13745 exception, and now back to normal)

PuckStar · 2023-10-29T20:43:53Z

It's getting close to impossible to re-download 531 or below now, oldest i could find are 532. And my 4060 is not detected until the 536 something.

You can still download them: https://www.nvidia.com/download/driverResults.aspx/204245/en-us/

Started testing myself and so far so good! Before I could only generate a few SDXL images and then it would choke completely and generating time increased to like 20min or so

Oof, what did you try to do for having such a drop ?
Nothing. It was part of the bug mentioned in this thread. My version was 536 or something and always when I started SD and generated SDXL images the first few went fine, but after that sluggish as hell.
Only a complete restart of SD would fix that.

Now with the newest drivers (545.92) I don't have that issue anymore! Each image generation takes the same amount of time.

zethriller · 2023-10-29T20:45:19Z

Will give it a shot sometime. Thanks for the info.

Edit: 545.92 seem better for generations but for some heavy cases like described above (sdxl + hires 2x), it's now the whole system that gets really sluggish despite close to 0% cpu use, no disk activity and close to 0% gpu use, when current batch is complete. Will persist like that even after browser close and SD console close. Actually, until a restart.
Hard to click the restart button when you actually have one frame per 10 seconds...

Guess i'll refrain from using that but i hope it won't affect something else, or i'm good for another rollback.

zero01101 · 2023-10-31T18:07:32Z

confirming 2023-10-31 v546.01 new CUDA memory fallback option works as described, previously requiring v531.x for same behavior 🎉

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 24.00 GiB total capacity; 18.12 GiB already allocated; 0 bytes free; 22.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

sinand99 · 2023-10-31T19:33:18Z

So, with the new driver, we don't need medvram switch for 8GB gpus?

zethriller · 2023-10-31T21:27:55Z

That's good to hear !
Does that mean memory consuming jobs will actually fail now ?

catboxanon · 2023-10-31T21:37:56Z

As mentioned in #11063 (comment), NVIDIA has made a help article to disable the system memory fallback behavior. Please upgrade to the latest driver and follow the guide on their website: https://nvidia.custhelp.com/app/answers/detail/a_id/5490

light-and-ray · 2023-11-01T02:33:37Z

Are you planning to add this option inside the ui?

younyokel · 2023-11-14T21:00:12Z

Just updated to 546.17, they still haven't fixed the black screen issue...

AndresM412 · 2023-12-02T21:35:39Z

Recién actualizado a 546.17, todavía no han solucionado el problema de la pantalla negra...
Any fix? im having the same problem

w-e-w pinned this issue Jun 6, 2023

fiskbil mentioned this issue Jun 7, 2023

[Bug]: A tensor with all NaNs was produced in Unet #10343

Closed

1 task

This was referenced Jun 8, 2023

[Bug]: img2img + ControlNet v1.1.220 no longer works with 3GB RAM (tried canny) Mikubill/sd-webui-controlnet#1593

Closed

[Bug]: ControlNet don't work at all Mikubill/sd-webui-controlnet#1594

Closed

lllyasviel mentioned this issue Jun 20, 2023

[Bug]: Upscaling very large images with SD Ultimate Upscale has VERY SLOW preprocessing (75% of total execution time) Mikubill/sd-webui-controlnet#1648

Closed

1 task

huchenlei mentioned this issue Jun 30, 2023

[Bug]: controlnet got very slow after today's nvidia update Mikubill/sd-webui-controlnet#1735

Closed

1 task

catboxanon added the announcement label Aug 3, 2023

catboxanon mentioned this issue Aug 7, 2023

Use "Shared GPU memory"? #2550

Closed

w-e-w reopened this Oct 19, 2023

w-e-w changed the title ~~[Maybe Resolved] NVIDIA driver performance issues~~ NVIDIA driver performance issues Oct 19, 2023

catboxanon closed this as completed Oct 31, 2023

catboxanon changed the title ~~NVIDIA driver performance issues~~ [Resolved] NVIDIA driver performance issues Oct 31, 2023

w-e-w unpinned this issue Dec 4, 2023

w-e-w mentioned this issue May 9, 2024

Hanging on 100% issue #15744

Open

w-e-w mentioned this issue May 23, 2024

[Bug]: Stable Diffusion is now very slow and won't work at all #15870

Open

6 tasks

[Resolved] NVIDIA driver performance issues #11063

[Resolved] NVIDIA driver performance issues #11063

Comments

w-e-w commented Jun 6, 2023 • edited by catboxanon Loading

Update (2023-10-31)

Update (2023-10-19)

Update (2023-10-17)

Update (2023-10-14)

Discussed in #11062

tusharbhutt commented Jun 7, 2023

chaewai commented Jun 7, 2023

designborg commented Jun 11, 2023

AIDungeonTester2 commented Jun 17, 2023

younyokel commented Jun 19, 2023 • edited Loading

hearmeneigh commented Jun 25, 2023 • edited Loading

Shawdooow commented Jun 25, 2023 • edited Loading

younyokel commented Jul 3, 2023

designborg commented Jul 3, 2023

PsychoGarlic commented Jul 3, 2023

invaderxan1 commented Jul 5, 2023

LabunskyA commented Jul 5, 2023 • edited Loading

RobotsHoldingHands commented Jul 8, 2023

Detrian commented Jul 9, 2023

PsychoGarlic commented Jul 9, 2023 via email

dajusha commented Jul 18, 2023

PsychoGarlic commented Jul 19, 2023 via email

WhiteX commented Jul 20, 2023

prescience-data commented Jul 24, 2023 • edited Loading

Initial environment baseline

Biggest improvement

catboxanon commented Aug 3, 2023

catboxanon commented Aug 6, 2023

david-trigo commented Aug 7, 2023 • edited Loading

sandros94 commented Oct 18, 2023 • edited Loading

kanbol commented Oct 19, 2023 • edited Loading

w-e-w commented Oct 19, 2023

wogam commented Oct 20, 2023

gravitate-dev commented Oct 20, 2023 • edited Loading

zethriller commented Oct 22, 2023 • edited Loading

huntermilo1 commented Oct 24, 2023

KrisadaFantasy commented Oct 24, 2023

FilipeF12 commented Oct 24, 2023

DmytroSokhach commented Oct 24, 2023 • edited Loading

PuckStar commented Oct 26, 2023 • edited Loading

Proryanator commented Oct 26, 2023

zethriller commented Oct 29, 2023 • edited Loading

DmytroSokhach commented Oct 29, 2023

PuckStar commented Oct 29, 2023

zethriller commented Oct 29, 2023 • edited Loading

zero01101 commented Oct 31, 2023 • edited Loading

sinand99 commented Oct 31, 2023

zethriller commented Oct 31, 2023

catboxanon commented Oct 31, 2023

light-and-ray commented Nov 1, 2023

younyokel commented Nov 14, 2023

AndresM412 commented Dec 2, 2023

w-e-w commented Jun 6, 2023 •

edited by catboxanon

Loading

younyokel commented Jun 19, 2023 •

edited

Loading

hearmeneigh commented Jun 25, 2023 •

edited

Loading

Shawdooow commented Jun 25, 2023 •

edited

Loading

LabunskyA commented Jul 5, 2023 •

edited

Loading

prescience-data commented Jul 24, 2023 •

edited

Loading

david-trigo commented Aug 7, 2023 •

edited

Loading

sandros94 commented Oct 18, 2023 •

edited

Loading

kanbol commented Oct 19, 2023 •

edited

Loading

gravitate-dev commented Oct 20, 2023 •

edited

Loading

zethriller commented Oct 22, 2023 •

edited

Loading

DmytroSokhach commented Oct 24, 2023 •

edited

Loading

PuckStar commented Oct 26, 2023 •

edited

Loading

zethriller commented Oct 29, 2023 •

edited

Loading

zethriller commented Oct 29, 2023 •

edited

Loading

zero01101 commented Oct 31, 2023 •

edited

Loading