Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are there any plans to support GPU offloading with LORAs? #1984

Closed
l3utterfly opened this issue Jun 24, 2023 · 4 comments
Closed

Are there any plans to support GPU offloading with LORAs? #1984

l3utterfly opened this issue Jun 24, 2023 · 4 comments
Labels

Comments

@l3utterfly
Copy link
Contributor

Are there any plans to support this?

Reading some of the past issues, seems the main thing blocking is that CUDA uses f32 whilst LORA uses f16 tensors.

Is that still the case?

I can give a shot at implementing this if someone can give me a rough rundown on all the hurdles.

@abc-nix
Copy link

abc-nix commented Jun 24, 2023

Hi.

I think @JohannesGaessler has already started to work on it. At least they published #1970 pull request with a rough way to make it work.

@JohannesGaessler
Copy link
Collaborator

Yes, I have already started working on it due to the advent of SuperHOT and will try to finalize it soon.

@JohannesGaessler
Copy link
Collaborator

I fixed up the PR. Also I should clarify: the PR only enables CUDA acceleration for f16 models. I previously misunderstood how ggml LoRAs are applied. What needs to be done is to modify the weights with the LoRA which is complicated by the fact that this is done after they're already in VRAM where regular ggml operations can't reach them.

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants