-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing the stage1 and stage2 Model problem on L40s #27
Comments
Do you mean the checkpoint of stage 2? We do not mention the results of stage 2 in paper. The result of table 3 you want to reproduce or table 7? |
Exactly. I carefully read your paper and find the relevant experimental result in Table.10 in the supplementary materials. |
Addtionally, I think it is necessary to clarify our dataset used to reproduce.
|
I think you misunderstood our paper. The LLaVA-phi in table 7 is not obtained by training with stage 2 data. Please refer to variant c of table 5 and the We did not validate the results of stage 2 with stage 2 data, but to make sure your results are consistent, we did just now. The result we got on textqa was 31.7 aligned with you. By the way, if you want to get the better results, you can take the LLaVA-1.5 data (which is the stage 3 data in MoE-LLaVA) and train a non-MoE version. That would actually be an LLaVA-phi and have no connection to MoE-LLaVA. |
Thx a lot. This project is solid and open to the community. I will keep in touch with you to further explore the protential of the method. |
@LinB203 |
We have actually finished training MoE-LLaVA-minicpm. we provide all three stages |
Thank you. My init loss is the same with you. I will reopen a new issue if more questions are met. |
Btw, we are training on 384×384 resolution. So the final loss maybe a little different. As the json shows, the loss rises dramatically in the last few steps causing the last saved checkpoint to be unavailable. So I suggest you can save more checkpoints during the process. e.g. if you train 5198 steps in total, maybe 5000 steps will be much better than the last. This seems to be a problem caused by minicpm, I haven't encountered it in other models. |
I am currently working on 336x336 resolution. I didn't came up with the phenomenon of increase loss on the end. Canon ODADADADADADADA
0%| | 1/5000 [00:06<8:34:01, 6.17s/it]
OCRupupupupupupupupupupD D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D F ACRupupup" The small small small small small small s
mall small small small small small small small small a C RUP
0%| | 2/5000 [00:10<7:10:31, 5.17s/it]
ThisESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTESTEST
0%| | 3/5000 [00:11<4:43:20, 3.40s/it]
No Single Single Single Single OCR
0%| | 4/5000 [00:16<5:18:11, 3.82s/it]
The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The
The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The Th
e The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The T
he The The
0%|▏ | 5/5000 [00:20<5:37:25, 4.05s/it]
Number Number Number 2,,,,,,,,,,,,,,,2, O,,,,2, O,, a player from the baseball baseball baseball player from the "
2,, O,, a 2, a player from the "2,
0%|▏ | 6/5000 [00:25<5:48:18, 4.18s/it]
The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The
The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The Th
e The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The T
he The The
0%|▏ | 7/5000 [00:29<5:55:32, 4.27s/it]
RoleEPEPEP OCR
0%|▏ | 8/5000 [00:34<5:59:31, 4.32s/it]
AITITOCR
0%|▏ | 9/5000 [00:38<6:09:30, 4.44s/it]
The Phot Phot Phot Phot Phot Phot Phot Phot Phot Phot Phot Phot L L L L L L L L ACR
0%|▎ | 10/5000 [00:43<6:09:02, 4.44s/it]
OffCR
0%|▎ | 11/5000 [00:47<6:10:14, 4.45s/it]
OCR Honey Honey Honey Honey
0%|▎ | 12/5000 [00:52<6:11:34, 4.47s/it]
The OCRCR
0%|▎ | 13/5000 [00:56<6:12:15, 4.48s/it]
Sk Sk Sk Sk Sk Sk Sk Sk Sk Sk Sk Sk Sk Sk Sk Sk Sk Sk Sk
0%|▎ | 14/5000 [01:01<6:12:51, 4.49s/it]
0%|▎ | 14/5000 [01:01<6:07:58, 4.43s/it] The implementation of my MINICPM template are as follows.
|
Here is my conv template. conv_minicpm = Conversation(
system="A chat between a curious user and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the user's questions.",
roles=("USER", "ASSISTANT"),
version="minicpm",
messages=(),
offset=0,
sep_style=SeparatorStyle.TWO,
sep=" ",
sep2="</s>",
) |
@LinB203 Hi Lin, thanks for your great work and thoughtful interactions. The final MoE-LLaVA is finetuned from a Stage 2 finetuned checkpoint. I finetuned one Stage 2 checkpoint as your shared in https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/scripts/v1/phi2/finetune.sh. The result is
I'm not sure if this result is reasonable. Would you please share some evaluation metrics of this checkpoint on the VQAv2 dataset? It would be much appreciated if you could share these checkpoints. Thanks. |
You can find an accuracy score in the results. You can check them. |
Hi @cydiachen , many thanks for your kind reply and your shared information. Greatly appreciated! |
Thank you for your excellent job.
I followed your work and download the released dataset from your link.
Since you have kindly provided an end-to-end script and processed dataset file. I thought we can quickly reproduce your excellent work. But After two days of training, we get our LLaVA-phi2 model. It can infer by your code.
But It can not reproduce the excellent accuracy in your paper. Would you mind sharing any train logs or detailed information with us. Therefore, we can debug the training process and find out what happened.
The text was updated successfully, but these errors were encountered: