-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged lora model forgets lora when converted to ggml. (with llama-cpp-python, DOES NOT repro with ./main) #1631
Comments
I'd guess it's not actually merging the LORA into the existing tensors but instead saving them under a different name or just saving the LORA stuff in the model to be merged when it gets loaded. I don't think there's another way the GGML conversion could have the result you're describing. The conversion scripts here would probably need to be adapted to look for the correct tensor names in the model (and/or merge if necessary). Or you'd need to merge it in a different way so that it actually ends up like a normal non-LORA model as far as stuff like the tensor names go. |
That's my best guess as well, but I have no idea why this would happen or how to understand what I'm looking at when I inspect the models tensors. (I think others have managed to successfully merge loras into base models and convert to ggml) I'm hoping someone already has that knowledge and can chime in as to what I need to do, or what I need to look for. |
Maybe try #1531 also see that peft PR Also, 4q Lora target bnb.nn.linear4bit Try change you Lora.py like the PR mentioned about embedding or make it work. Btw. If it actually work, you need to load in 4bit to merge it I suppose. |
that peft PR did not fix the issue here, unfortunately.
This will not work, peft will error out in merge and unload. (see this line) |
😅then u have to check inside that, like very earlier aplaca script. But I am afraid that may still won't work for q Lora "bnb.nn.linear4bit". And I guess ggml's lora function is not working for your q Lora too. |
Strangely enough that does seem to work, which lead me to my next test... both the merged ggml and the lora DOES work with ./main, but not with llama-cpp-python. So either something is wrong with my python test, my llama-cpp-python install, the shared object or something else. I should've looked into this earlier. Thanks for all the help and suggestions everyone! |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
After merging a lora with a HF model, I can convert it to ggml with
convert-pth-to-ggml
and observe that the converted model behaves similarly to the original merged model.Context:
Current Behavior
For some reason, the converted model behaves similarly to the base model without the merge.
Environment and Context
Using latest llama-cpp-python for inference, latest llama.cpp for conversion.
llama.cpp: master-3b126f6
llama-cpp-python: 1.55.0
The rest of the environment is using the dev commit for huggingface libraries, see the qlora blog post
Physical (or virtual) hardware you are using, e.g. for Linux:
AMD Ryzen Threadripper 3970X 32-Core Processor
WSL 2
Linux 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Failure Information (for bugs)
See above - the converted model is missing some state.
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
Failure Logs
Unfortunately, this failure is not in such a way that produces failure logs, it is in model behavior.
The text was updated successfully, but these errors were encountered: