You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I fixed up the PR. Also I should clarify: the PR only enables CUDA acceleration for f16 models. I previously misunderstood how ggml LoRAs are applied. What needs to be done is to modify the weights with the LoRA which is complicated by the fact that this is done after they're already in VRAM where regular ggml operations can't reach them.
Are there any plans to support this?
Reading some of the past issues, seems the main thing blocking is that CUDA uses f32 whilst LORA uses f16 tensors.
Is that still the case?
I can give a shot at implementing this if someone can give me a rough rundown on all the hurdles.
The text was updated successfully, but these errors were encountered: