Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of CUDA device id selection (--device-id 0/1/2) #3377

Merged
merged 3 commits into from
Oct 22, 2022
Merged

Implementation of CUDA device id selection (--device-id 0/1/2) #3377

merged 3 commits into from
Oct 22, 2022

Conversation

Extraltodeus
Copy link
Contributor

@Extraltodeus Extraltodeus commented Oct 21, 2022

Hello,

I added the possibility to select which GPU to use with CUDA by a commandline argument "--device-id".

Screenshot after starting on the GPU 1:
image

Screenshot while loading on GPU 0 (then my session crashed because of a lack of RAM) :
image

On top of allowing to select which GPU to use, it can allow two sessions in parallel if the system has enough RAM (not my case tho 😅)

@dfaker
Copy link
Collaborator

dfaker commented Oct 22, 2022

Does this have some advantage over the CUDA_VISIBLE_DEVICES environment variable?

@Extraltodeus
Copy link
Contributor Author

Extraltodeus commented Oct 22, 2022

@dfaker
Even if both GPU are visible by torch, only the first GPU is used. The CUDA_VISIBLE_DEVICES environment variable has no effect for running inferences.

If you mean that modifying that variable allows to select the GPU, then it is only available for single instances use (or perhaps by redeclaring it before starting a second session but this would prevent switching models or could be the source of conflicts if for any reason the instance queries the device again).

Therefore making possible to select a second GPU allows to run two instances at once ON TOP of being simply able to select the device in a cleaner fashion.

@dfaker
Copy link
Collaborator

dfaker commented Oct 22, 2022

The CUDA_VISIBLE_DEVICES environment variable has no effect for running inferences.

This is news to me! But easily running multiple instances on separate devices makes sense, a little more convenient than chaining an export.

@Extraltodeus
Copy link
Contributor Author

@dfaker Exactly!

@AUTOMATIC1111 AUTOMATIC1111 merged commit e80bdca into AUTOMATIC1111:master Oct 22, 2022
@Extraltodeus Extraltodeus deleted the cuda-device-id-selection branch October 22, 2022 14:43
@Lukium
Copy link

Lukium commented Nov 7, 2022

@Extraltodeus This has been really awesome. I'm currently running a server with 11 3090s thanks to this! I did find some small issues though. Some portions of the code, namely Face Restoration (all samplers) and Highres Fix (DDIM and possibly PLMS) seem to ignore --device-id and just use CUDA:0, which throws an error. I added the bug report here #3713. Is this something that you might be able to look at?

@Extraltodeus
Copy link
Contributor Author

@Extraltodeus This has been really awesome. I'm currently running a server with 11 3090s thanks to this! I did find some small issues though. Some portions of the code, namely Face Restoration (all samplers) and Highres Fix (DDIM and possibly PLMS) seem to ignore --device-id and just use CUDA:0, which throws an error. I added the bug report here #3713. Is this something that you might be able to look at?

I am unfortunately not having access to a capable enough multi-GPU environment (my only acces has 2 GPU but not enough RAM to run two instances).

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:2! (when checking argument for argument weight in method wrapper__cudnn_convolution)

Looks like either a check should be avoided or a device id precised in the function that gets to that.

Right now I could only test blindly however.

@Lukium
Copy link

Lukium commented Nov 16, 2022

@Extraltodeus This has been really awesome. I'm currently running a server with 11 3090s thanks to this! I did find some small issues though. Some portions of the code, namely Face Restoration (all samplers) and Highres Fix (DDIM and possibly PLMS) seem to ignore --device-id and just use CUDA:0, which throws an error. I added the bug report here #3713. Is this something that you might be able to look at?

I am unfortunately not having access to a capable enough multi-GPU environment (my only acces has 2 GPU but not enough RAM to run two instances).

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:2! (when checking argument for argument weight in method wrapper__cudnn_convolution)

Looks like either a check should be avoided or a device id precised in the function that gets to that.

Right now I could only test blindly however.

@Extraltodeus sorry I just saw this. I would welcome you into our Discord and would happily test any adjustments to the code in real-time in my environment until we get it figured out. I'm not sure if it's cool to post discord links here but it's in my profile. There's also plenty of other people there (around 400) that can help test.

Thanks again for the awesome addition to the UI.

EDIT: If you do join the Discord, please DM me @Lukium # 0001 and I'll set you up with a @Developer role so you can access all the instances as well.

@zeigerpuppy
Copy link

zeigerpuppy commented Jan 17, 2023

Hi @Lukium I guess you're distributing requests across the multiple web-ui instances using a web proxy.
Do you have any further details of the implementation? I guess HAProxy would work with multiple back-ends and load-balancing mode, but curious what solution you chose!

@Kotori05
Copy link

Perhaps the idea of tiled diffsion can be used for multi-GPU support?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants