Tensor Size Mismatch in language decoder on evaluation of BLIP with COCO caption task #241

Luodian · 2023-04-09T17:54:42Z

Hi sorry to bother, it would be much appreciated if you could take a look at this error on BLIP(v1) with COCO caption task.
I was running command

python -m torch.distributed.run --nproc_per_node=1 LAVIS/evaluate.py --cfg-path=LAVIS/lavis/projects/blip/eval/caption_coco_eval.yaml

and it was

without any modifications
just simple default run
I dont have any more errors with other tasks)

The error I encountered is

Exception has occurred: RuntimeError
The size of tensor a (192) must match the size of tensor b (576) at non-singleton dimension 0

The error was on

...
generate_from_encoder (/home/LAVIS/lavis/models/med.py:1360)
generate (/home/LAVIS/lavis/models/blip_models/blip_caption.py:188)
...

The dimension of relevant tensors are

The text was updated successfully, but these errors were encountered:

dxli94 · 2023-04-10T01:13:50Z

You may want to downgrade your transformers version to >=4.25.0,<4.27

Luodian · 2023-04-10T04:43:06Z

Thanks! Is there a possible reason that causes this error? So maybe I could somehow fix it with higher version transformer. I may use 4.28.dev0 for my own project.

dxli94 · 2023-04-10T06:00:32Z

@Luodian we haven't looked into the issue yet. If you can help to suggest and possibly PR, that'd be very helpful.

Luodian · 2023-04-12T12:22:25Z

Hi I think the reason for this is due to the prompt not being correctly repeated when using num_beam>1.

In lavis/models/med.py, line 1331, the visual_embeds are repeated num_beams times but tokenized_prompt.input_ids was not.

        if not use_nucleus_sampling:
            num_beams = num_beams
            visual_embeds = visual_embeds.repeat_interleave(num_beams, dim=0)

Please verify if this is indeed the correct reason.

I have submitted two pull requests (PRs) to address this issue. You may choose either one for review and potential merging.

The first PR (Luodian:fix-blip_caption/coco_caption_eval) directly repeats the tokenized_prompt in blip_caption.py.

The second PR (Luodian:fix-med/coco_caption_eval) adds the repeated code into med.py and ensures that the dimensions are aligned, which may also address other cases.

This was referenced Apr 12, 2023

Fix med/coco caption #252

Open

fix on blip_caption.py: prepare prompt for beam search, aovid tensor … #251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor Size Mismatch in language decoder on evaluation of BLIP with COCO caption task #241

Tensor Size Mismatch in language decoder on evaluation of BLIP with COCO caption task #241

Luodian commented Apr 9, 2023

dxli94 commented Apr 10, 2023

Luodian commented Apr 10, 2023

dxli94 commented Apr 10, 2023

Luodian commented Apr 12, 2023 •

edited

Loading

Tensor Size Mismatch in language decoder on evaluation of BLIP with COCO caption task #241

Tensor Size Mismatch in language decoder on evaluation of BLIP with COCO caption task #241

Comments

Luodian commented Apr 9, 2023

dxli94 commented Apr 10, 2023

Luodian commented Apr 10, 2023

dxli94 commented Apr 10, 2023

Luodian commented Apr 12, 2023 • edited Loading

Luodian commented Apr 12, 2023 •

edited

Loading