-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensor Size Mismatch in language decoder on evaluation of BLIP with COCO caption task #241
Comments
You may want to downgrade your transformers version to >=4.25.0,<4.27 |
Thanks! Is there a possible reason that causes this error? So maybe I could somehow fix it with higher version transformer. I may use 4.28.dev0 for my own project. |
@Luodian we haven't looked into the issue yet. If you can help to suggest and possibly PR, that'd be very helpful. |
Hi I think the reason for this is due to the prompt not being correctly repeated when using In if not use_nucleus_sampling:
num_beams = num_beams
visual_embeds = visual_embeds.repeat_interleave(num_beams, dim=0) Please verify if this is indeed the correct reason. I have submitted two pull requests (PRs) to address this issue. You may choose either one for review and potential merging. The first PR (Luodian:fix-blip_caption/coco_caption_eval) directly repeats the The second PR (Luodian:fix-med/coco_caption_eval) adds the repeated code into |
Hi sorry to bother, it would be much appreciated if you could take a look at this error on BLIP(v1) with COCO caption task.
I was running command
and it was
The error I encountered is
The error was on
The dimension of relevant tensors are
The text was updated successfully, but these errors were encountered: