Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Resolved] NVIDIA driver performance issues #11063

Closed
w-e-w opened this issue Jun 6, 2023 Discussed in #11062 · 92 comments
Closed

[Resolved] NVIDIA driver performance issues #11063

w-e-w opened this issue Jun 6, 2023 Discussed in #11062 · 92 comments

Comments

@w-e-w
Copy link
Collaborator

w-e-w commented Jun 6, 2023

Update (2023-10-31)

This issue should now be entirely resolved. NVIDIA has made a help article to disable the system memory fallback behavior. Please upgrade to the latest driver (546.01 or newer) and follow the guide on their website: https://nvidia.custhelp.com/app/answers/detail/a_id/5490

Update (2023-10-19)

issue has been reopened as it seems like more and more reports are saying that the issue is not yet fixed

Update (2023-10-17)

there seems to be some reports saying that the issue is still not fixed

comments

#11063 (comment)
#11063 (comment)

Update (2023-10-14)

This issue has reportedly been fixed by NVIDIA as of 537.58 (537.42 if using Studio release). Please update your drivers to this version or later.

The original issue description follows.


Discussed in #11062

Originally posted by w-e-w June 7, 2023
some users have reported some issues related to the latest Nvidia drivers
nVidia drivers change in memory management vladmandic#1285
#11050 (comment)
if you have been experiencing generation slowdowns or getting stuck, consider downgrading to driver version 531 or below
NVIDIA Driver Downloads

This issue will be closed when NVIDIA resolves this issue. It currently has a tracking number of [4172676]

@w-e-w w-e-w pinned this issue Jun 6, 2023
@tusharbhutt
Copy link

Funny, I am randomly getting the issues where an output is stuck at 50% for an hour, and I am on 531.41 for an NVIDIIA 3060 12GB model

@chaewai
Copy link

chaewai commented Jun 7, 2023

Strangely mine seems to go at normal speed for the first gen on a checkpoint, or if I change the clip on a checkpoint, but subsequent gens go muuuch slower. Annoyingly Diablo won't run on 531.

@designborg
Copy link

I can confirm this bug. I was getting results (as expected) before I installed the latest Titan RTX drivers. I will try installing a previous build.

@AIDungeonTester2
Copy link

Strangely mine seems to go at normal speed for the first gen on a checkpoint, or if I change the clip on a checkpoint, but subsequent gens go muuuch slower. Annoyingly Diablo won't run on 531.

Yeah, that's exactly how it is for me. When I tried inpainting, the first gen runs through just fine, but any subsequent ones have massive hang-ups, necessitating a restart of the commandline window and rerunning webui-user.bat.

@younyokel
Copy link

younyokel commented Jun 19, 2023

I wasn't sure if there was a problem with the drivers so I reinstalled WebUI, but the problem didn't go away. To think, everything generates fine like before, but once the High Res Fix starts and finishes, it looks like a minute pause.
Edit: confirming. Downgraded to 531.68. Now everything as it was

@hearmeneigh
Copy link

hearmeneigh commented Jun 25, 2023

If you are stuck with a newer Nvidia driver version, downgrading to Torch 1.13.1 seems to work too.

  1. Add the following to webui-user.bat:
    set TORCH_COMMAND=pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
  2. Remove <webui-root>/venv directory
  3. (Re)start WebUI

@Shawdooow
Copy link

Shawdooow commented Jun 25, 2023

I am having the opposite issue where on the newer drivers my first image generation is slow because of some clogged memory on my GPU which frees itself as soon as it gets to the second one.
image

Downgrading Torch didn't seem to help at all.
Downgrading from 536.23 to 531.79 fixes the problem instantly.

@younyokel
Copy link

Anyone, is this problem still relevant?

@designborg
Copy link

I haven't tried with the latest drivers, so I don't know if this issue is still ongoing.

@PsychoGarlic
Copy link

Extremely slow for me. Downgraded the pytorch, and had a whole lotta of new problems. What usually took 4h is taking 10+

@invaderxan1
Copy link

Please tell me there is a fix in the pipeline?

@LabunskyA
Copy link

LabunskyA commented Jul 5, 2023

For pro graphics (at least for my A4000) 531 is not going to help with eliminating the issue. Need to downgrade to at least 529 to get rid of the shared memory usage. And both 529 / 531 / 535 / 536 in production brunch are working way worse, than 531 at new feature (uses shared VRAM, but way smaller footprint for some reason)

@RobotsHoldingHands
Copy link

Can confirm this is still an issue, I have a RTX 3080 TI and downgrading to 531.68 solved it for me.

@Detrian
Copy link

Detrian commented Jul 9, 2023

I'm using a 3070, torch: 2.0.1+cu118, and can confirm that this is still an issue with the 536.40 driver. Using highres.fix in particular makes everything break once you reach 98% progress on an image.

@PsychoGarlic
Copy link

PsychoGarlic commented Jul 9, 2023 via email

@dajusha
Copy link

dajusha commented Jul 18, 2023

536.67 fixed this? or not?

@PsychoGarlic
Copy link

PsychoGarlic commented Jul 19, 2023 via email

@WhiteX
Copy link

WhiteX commented Jul 20, 2023

536.67 fixed it for me.

@prescience-data
Copy link

prescience-data commented Jul 24, 2023

536.67 also worked for me somewhat, meaning it still seems to drop to shared memory but not as aggressively (latest versions seem to start using shared memory at 10GB rather than fully maxing out all available 12GB which matters.

The 536.67 driver release notes still references shared memory, and I recently started getting the "hanging at 50% bug" again today after updating some plugins which prompted me to dig a bit deeper for some solutions.

I often use 2 or 3 ControlNet 1.1 models + Hi-res Fix upscaling on a 12GB card which is what triggers it if I watch my Performance tab and see the GPU begin to use shared CPU memory.

The ideal fix would be finding some way to create a --never-use-migraine-inducing-shared-memory flag, but I assume this would rely on some driver or operating system API to become available after some light research as there doesn't seem to be a way to "block" a specific process from using shared memory.

However, for the good news - I was able to massively reduce this >12GB memory usage without resorting to --medvram with the following steps:

Initial environment baseline

  1. Check your CLI to make sure you don't have any "using old xformers" WARN message (not sure if this is actually related but it was part of the process, so makes sense to include it)
  2. Add set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512 to webui-user.bat
  3. I assume here, 12GB users are already running the flags --xformers and --opt-split-attention.

Biggest improvement

Assuming your environment already looks similar to the above, by far the biggest VRAM drop I found was switching from the 1.4GB unpruned .pth ControlNet 1.1 models to these 750MB pruned .safetensors versions https://civitai.com/models/38784

Hope this helps anyone in a similar frustrating position 😁

@catboxanon
Copy link
Collaborator

From my understanding ComfyUI might've done something with CUDA's malloc to fix this. comfyanonymous/ComfyUI@1679abd

Looks like a lot of cards also don't support this though: https://github.com/search?q=repo%3Acomfyanonymous%2FComfyUI+malloc&type=commits&s=author-date&o=desc

@catboxanon
Copy link
Collaborator

536.67 also did not fix this, according to the release notes.

https://us.download.nvidia.com/Windows/536.67/536.67-win11-win10-release-notes.pdf

3.2 Open Issues in Version 536.67 WHQL

This driver implements a fix for creative application stability issues seen during heavy
memory usage. We’ve observed some situations where this fix has resulted in
performance degradation when running Stable Diffusion and DaVinci Resolve. This will
be addressed in an upcoming driver release. [4172676]

@david-trigo
Copy link

david-trigo commented Aug 7, 2023

I updated the drivers without thinking this might happen and now I can't go back. I have tried removing the drivers with "Display Driver Uninstaller" and then installing v531.68 and v528.49 , but it still doesn't go as fast as before. RTX 4080 (Laptop) 12GB. I seem to be missing something.

Edit: finally my problem seems to be with the laptop itself. Yesterday I was testing 536.67 and 536.99 on my desktop using RTX 3080 with no problems.

@sandros94
Copy link

sandros94 commented Oct 18, 2023

The latest NVIDIA driver 537.58 has a serious bug. After I installed and restarted my computer, my computer went black and there was no response. I can't operate a computer, for Christ's sake.

same. only the cursor was visible, i deleted the driver through safe mode

Damn NVIDIA, release the driver without even testing, want to know how to feedback this serious bug?

That's actually the difference between Game Ready Drivers and Studio Drivers.

The first only pass much simpler tests and gets pushed almost at every commit, while the Studio Drivers are almost actually tested (but if only a few tests don't pass it could get released anyway even if it still doesn't fix an issue like from 531 to 537.58).

Or at least this is what I've experienced since the Studio Driver first release

@kanbol
Copy link

kanbol commented Oct 19, 2023

My 4090 can use the 531 version driver to directly output 4k 3840*2160 through i2i. The video memory is almost full without OOM. If the resolution is higher, it will directly OOM.
After changing to the latest driver, the video memory of the 4k i2i is full at the beginning of the same operation, but instead of OOM, it starts to use the system memory, and the speed will be much slower.
this problem exists in all the drivers after version 531,including 545.84.

@w-e-w
Copy link
Collaborator Author

w-e-w commented Oct 19, 2023

@catboxanon I'm going to reopen this as it seems there are more reports saying that issue is not yet fixed

@w-e-w w-e-w reopened this Oct 19, 2023
@w-e-w w-e-w changed the title [Maybe Resolved] NVIDIA driver performance issues NVIDIA driver performance issues Oct 19, 2023
@wogam
Copy link

wogam commented Oct 20, 2023

545.84 tanks the performance on the 3090, down from multiple it/s to 6s per iteration.

@gravitate-dev
Copy link

gravitate-dev commented Oct 20, 2023

Not yet fixed. Although this isn't an issue with a1111 I respect that it's open.

Waiting for a new driver as the latest as of Oct 20 on the new feature branch didnt fix it.

A6000, Linux, 30 steps DPM++ 2M Karras 512x768
Before: 6s
Now: 18s

A 300% SLOWER image generation

@zethriller
Copy link

zethriller commented Oct 22, 2023

545.84 on a 4060 Ti, no issue during generation.
--However-- Hires has a major issue, x2 on a standard 3:2 XL (832x1216) format result in extremely high memory usage (16 GB VRAM full + about 10-12 GB shared ram), put the whole system on it knees and can last up to 10 minutes. It seems it is worse than with previous drivers (537 maybe ?)
SD 1.5 has no perceptible performance issue.
Occurence is random but usually after a few images in a batch, and it always occurs after the hires steps.
Sometimes it seems it will last forever, you can't even Ctrl+C the console. You can close it with the X button but once i got a blue screen doing that, the first ever i see in Win 11.
On the drivers before it could freeze like that for a few, then shows a trace dump in the console about websocket disconnect (once per session), then continue as normal.

@huntermilo1
Copy link

I have NVIDIA GTX 1650Ti 4GB VRAM.. (I know its low spec) with driver 532.03.. current generation time is lowest I have got i.e 2-3 minutes for 1024 x 1024 image. I'm skeptical to upgrade driver to the latest version. Would appreciate answers to my below queries.

  1. Should i upgrade to latest version given that most people raising issue here have a RTX card?
  2. Incase i face issue in image generation does rolling back the driver works? has anyone tried it?
  3. Anybody with same or similar graphics card has upgraded to latest version? are you'll facing any issue with Automatic1111?

PS: I'm a noob!

@KrisadaFantasy
Copy link

I have NVIDIA GTX 1650Ti 4GB VRAM.. (I know its low spec) with driver 532.03.. current generation time is lowest I have got i.e 2-3 minutes for 1024 x 1024 image. I'm skeptical to upgrade driver to the latest version. Would appreciate answers to my below queries.

  1. Should i upgrade to latest version given that most people raising issue here have a RTX card?
  2. Incase i face issue in image generation does rolling back the driver works? has anyone tried it?
  3. Anybody with same or similar graphics card has upgraded to latest version? are you'll facing any issue with Automatic1111?

PS: I'm a noob!

I am on 2060 6GB VRAM. Previously I update from 531.68 Studio Version to the 537.42 and got problem.

  1. I am now on 537.58 Studio version. no speed problem but I might be the lucky one.
  2. Yes, rolling back worked for me. I reverted from 537.42 with issue to 531.68 with clean install option and everything works fine. I keep it that way until 537.58 got released. If it worked better for you before, you should consider using older driver for the time being, also I remember 532 is withing the range of versionswith issue.

I also noticed increase in speed in latest driver. Maybe it is as NVDIA advertised or something I cleared out because I reset my whole PC, but my MEDVRAM speed has increased from about 1-3 s/it to 1-2 its/s. Splendid.

@FilipeF12
Copy link

I have NVIDIA GTX 1650Ti 4GB VRAM.. (I know its low spec) with driver 532.03.. current generation time is lowest I have got i.e 2-3 minutes for 1024 x 1024 image. I'm skeptical to upgrade driver to the latest version. Would appreciate answers to my below queries.

1. Should i upgrade to latest version given that most people raising issue here have a RTX card?

2. Incase i face issue in image generation does rolling back the driver works? has anyone tried it?

3. Anybody with same or similar graphics card has upgraded to latest version? are you'll facing any issue with Automatic1111?

PS: I'm a noob!

NVIDIA recommends uninstalling any new driver and then re-installing the old driver - as opposed to using windows to roll back the driver. Several people mentioned using a program named DDU to completely uninstall the new driver.

https://www.guru3d.com/download/display-driver-uninstaller-download/

Create a restore point first.

@DmytroSokhach
Copy link

DmytroSokhach commented Oct 24, 2023

NVCleanstall sounds like good automated option to install specific version
See here: #11050 (comment)

@PuckStar
Copy link

PuckStar commented Oct 26, 2023

Anyone know if this got fixed in the latest version 545.92 ?

EDIT: Started testing myself and so far so good! Before I could only generate a few SDXL images and then it would choke completely and generating time increased to like 20min or so. Needed to restart SD fully to get normal speed again.

With this new drivers I've generated already a dozen images and speeds stays the same!

@Proryanator
Copy link

I tried reverting to 528 myself from whatever the latest is now, and I'm still barely getting 1 It/s on my 3080ti (was getting 22 it/s before). I'm not sure it's the driver to be honest.

@zethriller
Copy link

zethriller commented Oct 29, 2023

It's getting close to impossible to re-download 531 or below now, oldest i could find are 532.
And my 4060 is not detected until the 536 something.

Started testing myself and so far so good! Before I could only generate a few SDXL images and then it would choke completely and generating time increased to like 20min or so

Oof, what did you try to do for having such a drop ?

@DmytroSokhach
Copy link

Guys

  1. to get older NVIDIA drivers installed, try
    a. https://www.guru3d.com/download/display-driver-uninstaller-download/
    b. https://www.techpowerup.com/download/techpowerup-nvcleanstall/
    c. Direct link to drivers: https://www.nvidia.com/download/driverResults.aspx/199990/en-us/
  2. Also make sure you have optimization set to "sdp" or anything, but not "None". (for a week I was struggling with [Bug]: OutOfMemoryError: CUDA out of memory. #13745 exception, and now back to normal)
    image

@PuckStar
Copy link

It's getting close to impossible to re-download 531 or below now, oldest i could find are 532. And my 4060 is not detected until the 536 something.

You can still download them: https://www.nvidia.com/download/driverResults.aspx/204245/en-us/

Started testing myself and so far so good! Before I could only generate a few SDXL images and then it would choke completely and generating time increased to like 20min or so

Oof, what did you try to do for having such a drop ?
Nothing. It was part of the bug mentioned in this thread. My version was 536 or something and always when I started SD and generated SDXL images the first few went fine, but after that sluggish as hell.
Only a complete restart of SD would fix that.

Now with the newest drivers (545.92) I don't have that issue anymore! Each image generation takes the same amount of time.

@zethriller
Copy link

zethriller commented Oct 29, 2023

Will give it a shot sometime. Thanks for the info.

Edit: 545.92 seem better for generations but for some heavy cases like described above (sdxl + hires 2x), it's now the whole system that gets really sluggish despite close to 0% cpu use, no disk activity and close to 0% gpu use, when current batch is complete. Will persist like that even after browser close and SD console close. Actually, until a restart.
Hard to click the restart button when you actually have one frame per 10 seconds...

Guess i'll refrain from using that but i hope it won't affect something else, or i'm good for another rollback.

@zero01101
Copy link

zero01101 commented Oct 31, 2023

confirming 2023-10-31 v546.01 new CUDA memory fallback option works as described, previously requiring v531.x for same behavior 🎉

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 24.00 GiB total capacity; 18.12 GiB already allocated; 0 bytes free; 22.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@sinand99
Copy link

So, with the new driver, we don't need medvram switch for 8GB gpus?

@zethriller
Copy link

That's good to hear !
Does that mean memory consuming jobs will actually fail now ?

@catboxanon
Copy link
Collaborator

As mentioned in #11063 (comment), NVIDIA has made a help article to disable the system memory fallback behavior. Please upgrade to the latest driver and follow the guide on their website: https://nvidia.custhelp.com/app/answers/detail/a_id/5490

@catboxanon catboxanon changed the title NVIDIA driver performance issues [Resolved] NVIDIA driver performance issues Oct 31, 2023
@light-and-ray
Copy link
Contributor

Are you planning to add this option inside the ui?

@younyokel
Copy link

Just updated to 546.17, they still haven't fixed the black screen issue...

@AndresM412
Copy link

Recién actualizado a 546.17, todavía no han solucionado el problema de la pantalla negra...
Any fix? im having the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests