Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RocM: filter VRAM fetch by HIP_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES #1106

Open
wants to merge 1 commit into
base: concedo
Choose a base branch
from

Conversation

jeroen-mostert
Copy link

Fixes #1104.

Or at least expresses the idea I had. Tested with a grand total of one machine, namely mine, which has one discrete GPU and an iGPU.

…fixes ggerganov#1104)

Signed-off-by: Jeroen Mostert <jeroen.mostert@cm.com>
@LostRuins
Copy link
Owner

I don't think this is the optimal way to do it. The problem is that it requires the user to correctly set HIP_VISIBLE_DEVICES manually. Instead, it should be determining the GPU type of the device from rocminfo

@jeroen-mostert
Copy link
Author

This change is not about making things work different if the user has not set HIP_VISIBLE_DEVICES. Any patches to that effect are completely orthogonal to this one. There are two concerns that should be kept firmly separated:

  1. What GPUs do we want to run this on?
  2. Given these GPUs, how much VRAM do they have and how should we distribute layers?

This change is only about fixing 2. Any changes you want to make to 1, whether using rocminfo or anything else, are a separate thing, but I would recommend that if the user has set HIP_VISIBLE_DEVICES/CUDA_VISIBLE_DEVICES, that should always be respected and override any heuristics applied in koboldcpp itself. If nothing else, this is needed if I have multiple discrete GPUs and want to choose which ones to use (the tensor split parameter is not sufficient for this, because even with no layers offloaded some resources are still allocated to cards).

I would also like to reiterate that we cannot use rocminfo to see if we're dealing with an iGPU, short of maintaining a list of gfx arches or other heuristics like the number of compute units (if it's "too low", we're probably dealing with an underpowered iGPU). The latter has some potential merit, but, again, it's separate from the VRAM calculation. That would be a patch of the form "if HIP_VISIBLE_DEVICES is not set, apply some logic to find the most appropriate discrete GPU and use that, or fall back to the iGPU if we have nothing better" (since it will almost certainly still be better than the CPU).

@LostRuins
Copy link
Owner

The GPU to run on is up to the user (the names are displayed), but the VRAM estimation should only come from dGPUs, thats why I would like to exclude iGPUs from the calculation. I would not like to make them completely unselectable, the user should make that choice.

@jeroen-mostert
Copy link
Author

jeroen-mostert commented Sep 8, 2024

So if I understand correctly, when autolayering is used and we end up running on an iGPU (exclusively), the number of layers offloaded should always be 0? I could see that working.

Even if you want that, though, that's still orthogonal to what this patch does, since you also need this for a selection of dGPUs (which will become relevant for my own system around oh, next week, I hope). The behavior I myself would like to see for autolayering is:

  1. If any number of dGPUs are selected, distribute as many layers as possible to them in decreasing order of speed, by setting the tensor split accordingly. The proper order is not really auto-detectable, I'm afraid, but if all else fails we could just do in order -- worst case this is wrong and the user needs to override it manually, but that's still better than nothing. We should try very hard to stuff as many layers as possible into any GPUs available, since the very first layer that's left to the CPU instantly tanks performance.
  2. If an iGPU is included in the mix, ignore it for purposes of automatic layer assignment (you can still set it manually, of course, if you really want to). If 1 is achieved this would happen if we consider the VRAM available to the iGPU to be 0.

I still don't know of any reliable way to detect an iGPU, though. Various heuristics suggest themselves, but I couldn't tell you which if any are reliable. (Do iGPUs always have "AMD Radeon Graphics" for the marketing name? Is their uuid always "GPU-XX"? Is this never true for dGPUs? Maybe, I couldn't tell you.)

@jeroen-mostert
Copy link
Author

jeroen-mostert commented Sep 8, 2024

For APU detection, there is some good news: rocminfo will include this from ROCm 6.2.0 onwards. There will be a new line "Memory Properties: APU" (for dGPUs, this just says "Memory Properties: "). Unfortunately, it'll be a good while before every setup has ROCm 6.2.0+. There are library functions that can be called in the meantime (from HSA, or amd_ags), but all of these require interop with a C lib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ROCm: Memory calculation fails to take HIP_VISIBLE_DEVICES into account
2 participants