[Question] how to merge the middle checkpoint file with lora #922

terminator123 · 2023-12-15T01:20:33Z

Question

i want to test the checkpoint-5000 in lora，when i ran
python scrips/merge_lora_weights.py --model-path ./checkpoints/llava-v1.5-13b-lora --model-base lmsys/vicuna-13b-v1.5 --save-model-path ./checkpoints/merge
it went wrong

Isaachhh · 2023-12-29T09:23:58Z

you need to copy the config.json and non_lora_trainables.bin into your checkpoint-5000 folder

charismaticchiu · 2024-02-28T06:48:30Z

I also have the same problem #1194. Did you solve it?

wuwu-C · 2024-04-20T13:13:00Z

you need to copy the config.json and non_lora_trainables.bin into your checkpoint-5000 folder
Is config.json and non_lora_trainable.bin saved only at the end of the entire training? I have set epoch 10, can I copy these two files from epoch 10 directly to the first nine?

Isaachhh · 2024-04-20T14:59:56Z

Is config.json and non_lora_trainable.bin saved only at the end of the entire training?

I think so.

I have set epoch 10, can I copy these two files from epoch 10 directly to the first nine?

The weights of projector are saved in non_lora_trainables.bin, which is unfrozen during sft stage.

wuwu-C · 2024-04-21T09:17:33Z

Thank you for your reply！but I also have some question

The weights of projector are saved in non_lora_trainables.bin, which is unfrozen during sft stage.

non_lora_trainable.bin is not storing the weight without lora trimming part, shouldn't it be frozen? Why is it a weight store for projectors?
In your previous answer, you said copy the two files to the corresponding weight folder.If it is unfrozen during sft stage, this way is incorrect.How can I merge the middle checkpoint file with lora.
Can you give me more detailed explanation,thank you!

Isaachhh · 2024-04-22T02:00:03Z

Thank you for your reply！but I also have some question

The weights of projector are saved in non_lora_trainables.bin, which is unfrozen during sft stage.

non_lora_trainable.bin is not storing the weight without lora trimming part, shouldn't it be frozen? Why is it a weight store for projectors?

In your previous answer, you said copy the two files to the corresponding weight folder.If it is unfrozen during sft stage, this way is incorrect.How can I merge the middle checkpoint file with lora.
Can you give me more detailed explanation,thank you!

non_lora_trainable, non_lora and trainable, so it stores projector because it's trained directly other than lora. Check here
Try:
a = torch.load('.../non_lora_trainables.bin')
print(a.keys())
Yes, you are right. And you may need to edit the source code to save projector weights in the middle.

Isaachhh mentioned this issue Nov 18, 2024

Question about deepspeed checkpoint loading BAAI-DCAI/Bunny#138

Open

This was referenced Dec 23, 2024

Add non-lora PEFT state into intermediate checkpoint saving. BAAI-DCAI/Bunny#143

Open

Add non-lora PEFT state into intermediate checkpoint saving. #1813

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] how to merge the middle checkpoint file with lora #922

[Question] how to merge the middle checkpoint file with lora #922

terminator123 commented Dec 15, 2023

Isaachhh commented Dec 29, 2023

charismaticchiu commented Feb 28, 2024

wuwu-C commented Apr 20, 2024

Isaachhh commented Apr 20, 2024

wuwu-C commented Apr 21, 2024

Isaachhh commented Apr 22, 2024

[Question] how to merge the middle checkpoint file with lora #922

[Question] how to merge the middle checkpoint file with lora #922

Comments

terminator123 commented Dec 15, 2023

Question

Isaachhh commented Dec 29, 2023

charismaticchiu commented Feb 28, 2024

wuwu-C commented Apr 20, 2024

Isaachhh commented Apr 20, 2024

wuwu-C commented Apr 21, 2024

Isaachhh commented Apr 22, 2024