RocM: filter VRAM fetch by HIP_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES #1106

jeroen-mostert · 2024-08-30T15:38:39Z

Or at least expresses the idea I had. Tested with a grand total of one machine, namely mine, which has one discrete GPU and an iGPU.

…fixes ggerganov#1104) Signed-off-by: Jeroen Mostert <jeroen.mostert@cm.com>

LostRuins · 2024-09-07T06:48:12Z

I don't think this is the optimal way to do it. The problem is that it requires the user to correctly set HIP_VISIBLE_DEVICES manually. Instead, it should be determining the GPU type of the device from rocminfo

jeroen-mostert · 2024-09-07T18:41:46Z

This change is not about making things work different if the user has not set HIP_VISIBLE_DEVICES. Any patches to that effect are completely orthogonal to this one. There are two concerns that should be kept firmly separated:

What GPUs do we want to run this on?
Given these GPUs, how much VRAM do they have and how should we distribute layers?

This change is only about fixing 2. Any changes you want to make to 1, whether using rocminfo or anything else, are a separate thing, but I would recommend that if the user has set HIP_VISIBLE_DEVICES/CUDA_VISIBLE_DEVICES, that should always be respected and override any heuristics applied in koboldcpp itself. If nothing else, this is needed if I have multiple discrete GPUs and want to choose which ones to use (the tensor split parameter is not sufficient for this, because even with no layers offloaded some resources are still allocated to cards).

I would also like to reiterate that we cannot use rocminfo to see if we're dealing with an iGPU, short of maintaining a list of gfx arches or other heuristics like the number of compute units (if it's "too low", we're probably dealing with an underpowered iGPU). The latter has some potential merit, but, again, it's separate from the VRAM calculation. That would be a patch of the form "if HIP_VISIBLE_DEVICES is not set, apply some logic to find the most appropriate discrete GPU and use that, or fall back to the iGPU if we have nothing better" (since it will almost certainly still be better than the CPU).

LostRuins · 2024-09-08T08:49:09Z

The GPU to run on is up to the user (the names are displayed), but the VRAM estimation should only come from dGPUs, thats why I would like to exclude iGPUs from the calculation. I would not like to make them completely unselectable, the user should make that choice.

jeroen-mostert · 2024-09-08T11:52:34Z

So if I understand correctly, when autolayering is used and we end up running on an iGPU (exclusively), the number of layers offloaded should always be 0? I could see that working.

Even if you want that, though, that's still orthogonal to what this patch does, since you also need this for a selection of dGPUs (which will become relevant for my own system around oh, next week, I hope). The behavior I myself would like to see for autolayering is:

If any number of dGPUs are selected, distribute as many layers as possible to them in decreasing order of speed, by setting the tensor split accordingly. The proper order is not really auto-detectable, I'm afraid, but if all else fails we could just do in order -- worst case this is wrong and the user needs to override it manually, but that's still better than nothing. We should try very hard to stuff as many layers as possible into any GPUs available, since the very first layer that's left to the CPU instantly tanks performance.
If an iGPU is included in the mix, ignore it for purposes of automatic layer assignment (you can still set it manually, of course, if you really want to). If 1 is achieved this would happen if we consider the VRAM available to the iGPU to be 0.

I still don't know of any reliable way to detect an iGPU, though. Various heuristics suggest themselves, but I couldn't tell you which if any are reliable. (Do iGPUs always have "AMD Radeon Graphics" for the marketing name? Is their uuid always "GPU-XX"? Is this never true for dGPUs? Maybe, I couldn't tell you.)

jeroen-mostert · 2024-09-08T12:23:11Z

For APU detection, there is some good news: rocminfo will include this from ROCm 6.2.0 onwards. There will be a new line "Memory Properties: APU" (for dGPUs, this just says "Memory Properties: "). Unfortunately, it'll be a good while before every setup has ROCm 6.2.0+. There are library functions that can be called in the meantime (from HSA, or amd_ags), but all of these require interop with a C lib.

RocM: filter VRAM fetch by HIP_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES (…

2282fb5

…fixes ggerganov#1104) Signed-off-by: Jeroen Mostert <jeroen.mostert@cm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RocM: filter VRAM fetch by HIP_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES #1106

RocM: filter VRAM fetch by HIP_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES #1106

jeroen-mostert commented Aug 30, 2024

LostRuins commented Sep 7, 2024

jeroen-mostert commented Sep 7, 2024

LostRuins commented Sep 8, 2024

jeroen-mostert commented Sep 8, 2024 •

edited

Loading

jeroen-mostert commented Sep 8, 2024 •

edited

Loading

RocM: filter VRAM fetch by HIP_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES #1106

Are you sure you want to change the base?

RocM: filter VRAM fetch by HIP_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES #1106

Conversation

jeroen-mostert commented Aug 30, 2024

LostRuins commented Sep 7, 2024

jeroen-mostert commented Sep 7, 2024

LostRuins commented Sep 8, 2024

jeroen-mostert commented Sep 8, 2024 • edited Loading

jeroen-mostert commented Sep 8, 2024 • edited Loading

jeroen-mostert commented Sep 8, 2024 •

edited

Loading

jeroen-mostert commented Sep 8, 2024 •

edited

Loading