Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] how to merge the middle checkpoint file with lora #922

Open
terminator123 opened this issue Dec 15, 2023 · 6 comments
Open

[Question] how to merge the middle checkpoint file with lora #922

terminator123 opened this issue Dec 15, 2023 · 6 comments

Comments

@terminator123
Copy link

Question

i want to test the checkpoint-5000 in lora,when i ran
python scrips/merge_lora_weights.py --model-path ./checkpoints/llava-v1.5-13b-lora --model-base lmsys/vicuna-13b-v1.5 --save-model-path ./checkpoints/merge
it went wrong

@Isaachhh
Copy link

you need to copy the config.json and non_lora_trainables.bin into your checkpoint-5000 folder

@charismaticchiu
Copy link

I also have the same problem #1194. Did you solve it?

@wuwu-C
Copy link

wuwu-C commented Apr 20, 2024

you need to copy the config.json and non_lora_trainables.bin into your checkpoint-5000 folder
Is config.json and non_lora_trainable.bin saved only at the end of the entire training? I have set epoch 10, can I copy these two files from epoch 10 directly to the first nine?

@Isaachhh
Copy link

Is config.json and non_lora_trainable.bin saved only at the end of the entire training?

I think so.

I have set epoch 10, can I copy these two files from epoch 10 directly to the first nine?

The weights of projector are saved in non_lora_trainables.bin, which is unfrozen during sft stage.

@wuwu-C
Copy link

wuwu-C commented Apr 21, 2024

Thank you for your reply!but I also have some question

The weights of projector are saved in non_lora_trainables.bin, which is unfrozen during sft stage.

  1. non_lora_trainable.bin is not storing the weight without lora trimming part, shouldn't it be frozen? Why is it a weight store for projectors?
  2. In your previous answer, you said copy the two files to the corresponding weight folder.If it is unfrozen during sft stage, this way is incorrect.How can I merge the middle checkpoint file with lora.
    Can you give me more detailed explanation,thank you!

@Isaachhh
Copy link

Thank you for your reply!but I also have some question

The weights of projector are saved in non_lora_trainables.bin, which is unfrozen during sft stage.

  1. non_lora_trainable.bin is not storing the weight without lora trimming part, shouldn't it be frozen? Why is it a weight store for projectors?
  2. In your previous answer, you said copy the two files to the corresponding weight folder.If it is unfrozen during sft stage, this way is incorrect.How can I merge the middle checkpoint file with lora.
    Can you give me more detailed explanation,thank you!
  1. non_lora_trainable, non_lora and trainable, so it stores projector because it's trained directly other than lora. Check here
    Try:
    a = torch.load('.../non_lora_trainables.bin')
    print(a.keys())

  2. Yes, you are right. And you may need to edit the source code to save projector weights in the middle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants