Implementation of CUDA device id selection (--device-id 0/1/2) #3377

Extraltodeus · 2022-10-21T22:19:51Z

Hello,

I added the possibility to select which GPU to use with CUDA by a commandline argument "--device-id".

Screenshot after starting on the GPU 1:

Screenshot while loading on GPU 0 (then my session crashed because of a lack of RAM) :

On top of allowing to select which GPU to use, it can allow two sessions in parallel if the system has enough RAM (not my case tho 😅)

dfaker · 2022-10-22T00:31:04Z

Does this have some advantage over the CUDA_VISIBLE_DEVICES environment variable?

Extraltodeus · 2022-10-22T00:36:59Z

@dfaker
Even if both GPU are visible by torch, only the first GPU is used. The CUDA_VISIBLE_DEVICES environment variable has no effect for running inferences.

If you mean that modifying that variable allows to select the GPU, then it is only available for single instances use (or perhaps by redeclaring it before starting a second session but this would prevent switching models or could be the source of conflicts if for any reason the instance queries the device again).

Therefore making possible to select a second GPU allows to run two instances at once ON TOP of being simply able to select the device in a cleaner fashion.

dfaker · 2022-10-22T01:49:48Z

The CUDA_VISIBLE_DEVICES environment variable has no effect for running inferences.

This is news to me! But easily running multiple instances on separate devices makes sense, a little more convenient than chaining an export.

Extraltodeus · 2022-10-22T02:19:30Z

@dfaker Exactly!

Lukium · 2022-11-07T10:42:56Z

@Extraltodeus This has been really awesome. I'm currently running a server with 11 3090s thanks to this! I did find some small issues though. Some portions of the code, namely Face Restoration (all samplers) and Highres Fix (DDIM and possibly PLMS) seem to ignore --device-id and just use CUDA:0, which throws an error. I added the bug report here #3713. Is this something that you might be able to look at?

Extraltodeus · 2022-11-07T12:50:00Z

@Extraltodeus This has been really awesome. I'm currently running a server with 11 3090s thanks to this! I did find some small issues though. Some portions of the code, namely Face Restoration (all samplers) and Highres Fix (DDIM and possibly PLMS) seem to ignore --device-id and just use CUDA:0, which throws an error. I added the bug report here #3713. Is this something that you might be able to look at?

I am unfortunately not having access to a capable enough multi-GPU environment (my only acces has 2 GPU but not enough RAM to run two instances).

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:2! (when checking argument for argument weight in method wrapper__cudnn_convolution)

Looks like either a check should be avoided or a device id precised in the function that gets to that.

Right now I could only test blindly however.

Lukium · 2022-11-16T11:48:13Z

@Extraltodeus This has been really awesome. I'm currently running a server with 11 3090s thanks to this! I did find some small issues though. Some portions of the code, namely Face Restoration (all samplers) and Highres Fix (DDIM and possibly PLMS) seem to ignore --device-id and just use CUDA:0, which throws an error. I added the bug report here #3713. Is this something that you might be able to look at?

I am unfortunately not having access to a capable enough multi-GPU environment (my only acces has 2 GPU but not enough RAM to run two instances).

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:2! (when checking argument for argument weight in method wrapper__cudnn_convolution)

Looks like either a check should be avoided or a device id precised in the function that gets to that.

Right now I could only test blindly however.

@Extraltodeus sorry I just saw this. I would welcome you into our Discord and would happily test any adjustments to the code in real-time in my environment until we get it figured out. I'm not sure if it's cool to post discord links here but it's in my profile. There's also plenty of other people there (around 400) that can help test.

Thanks again for the awesome addition to the UI.

EDIT: If you do join the Discord, please DM me @Lukium # 0001 and I'll set you up with a @Developer role so you can access all the instances as well.

zeigerpuppy · 2023-01-17T00:39:35Z

Hi @Lukium I guess you're distributing requests across the multiple web-ui instances using a web proxy.
Do you have any further details of the implementation? I guess HAProxy would work with multiple back-ends and load-balancing mode, but curious what solution you chose!

Kotori05 · 2024-03-18T08:36:08Z

Perhaps the idea of tiled diffsion can be used for multi-GPU support?

Extraltodeus added 2 commits October 22, 2022 00:11

implement CUDA device selection by ID

57eb54b

implement CUDA device selection, --device-id arg

29bfacd

Extraltodeus requested a review from AUTOMATIC1111 as a code owner October 21, 2022 22:19

Merge branch 'master' into cuda-device-id-selection

1fa53da

AUTOMATIC1111 approved these changes Oct 22, 2022

View reviewed changes

AUTOMATIC1111 merged commit e80bdca into AUTOMATIC1111:master Oct 22, 2022

Extraltodeus deleted the cuda-device-id-selection branch October 22, 2022 14:43

Extraltodeus mentioned this pull request Oct 26, 2022

How to allocate memory from 2nd GPU? #156

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of CUDA device id selection (--device-id 0/1/2) #3377

Implementation of CUDA device id selection (--device-id 0/1/2) #3377

Extraltodeus commented Oct 21, 2022 •

edited

Loading

dfaker commented Oct 22, 2022

Extraltodeus commented Oct 22, 2022 •

edited

Loading

dfaker commented Oct 22, 2022

Extraltodeus commented Oct 22, 2022

Lukium commented Nov 7, 2022 •

edited

Loading

Extraltodeus commented Nov 7, 2022

Lukium commented Nov 16, 2022 •

edited

Loading

zeigerpuppy commented Jan 17, 2023 •

edited

Loading

Kotori05 commented Mar 18, 2024

Implementation of CUDA device id selection (--device-id 0/1/2) #3377

Implementation of CUDA device id selection (--device-id 0/1/2) #3377

Conversation

Extraltodeus commented Oct 21, 2022 • edited Loading

dfaker commented Oct 22, 2022

Extraltodeus commented Oct 22, 2022 • edited Loading

dfaker commented Oct 22, 2022

Extraltodeus commented Oct 22, 2022

Lukium commented Nov 7, 2022 • edited Loading

Extraltodeus commented Nov 7, 2022

Lukium commented Nov 16, 2022 • edited Loading

zeigerpuppy commented Jan 17, 2023 • edited Loading

Kotori05 commented Mar 18, 2024

Extraltodeus commented Oct 21, 2022 •

edited

Loading

Extraltodeus commented Oct 22, 2022 •

edited

Loading

Lukium commented Nov 7, 2022 •

edited

Loading

Lukium commented Nov 16, 2022 •

edited

Loading

zeigerpuppy commented Jan 17, 2023 •

edited

Loading