BLIP: enable generation tests #34174

zucchini-nlp · 2024-10-15T11:51:02Z

What does this PR do?

Enables generation tests for BLIP models, except BLIP-1 (turned out to be a bit harder). I changed the generation tests to use modelTest.input_name as BLIP is the only model that uses pixel values as main input and thus checking generated text length's will always fail.

I tried to get rid of custom generate for these models, but that opened a Pandora box so I think better not waste time on an old model and maintain it for a while, until the model gets deprecated. But still I did some changes so we don't need to add extra bos at the beginning and now the decoder-based BLIP models return full text at output. Encoder-decoder based models return only generated text, which is consistent with what an LLM should do

HuggingFaceDocBuilderDev · 2024-10-15T12:42:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante · 2024-10-16T15:38:19Z

changed the generation tests to use modelTest.input_name as BLIP is the only model that uses pixel values as main input and thus checking generated text length's will always fail.

I'd like very much to avoid this change -- extra logic for all tests to handle a niche corner case. Let's brainstorm alternatives! main_input in the generative tests is used to check the shapes. Perhaps we always want to look for input_ids or inputs_embeds in the input dictionary? 🤔

zucchini-nlp · 2024-10-16T16:02:01Z

@gante I tried to force input_ids always and I found another corner case with ~~Whisper which expects input_features to be the main input for shape checking~~. I would very much love to make BLIP standard and maybe I'll make so in v5, because it will break a whole lot of things.

OMG, I found an option while writing this reply, whisper and the other audio model are encoder-decoder so we can make it work by getting main input in decoder-only models. Just before the check happens, in the same indent block :)
If you agree, I'll make the change hehe

ArthurZucker

AH actually we might need / want to force return_dict to TRUE, to avoid all the if elses

gante · 2024-10-17T13:44:00Z

OMG, I found an option while writing this reply, whisper and the other audio model are encoder-decoder so we can make it work by getting main input in decoder-only models. Just before the check happens, in the same indent block :)
If you agree, I'll make the change hehe

if it works, sounds good! (make sure to leave a comment)

zucchini-nlp · 2024-10-21T11:03:32Z

@gante requesting re-review, since the input-name was merged as a separate PR. I rebased main and ran tests again

zucchini-nlp added 4 commits October 15, 2024 12:52

blip2 tests

64873ab

instructblips

23d2e15

copies

8cf7507

Merge remote-tracking branch 'upstream/main' into blip-tests

b0ab34a

zucchini-nlp mentioned this pull request Oct 15, 2024

Track progress for VLMs refactoring #33374

Open

16 tasks

fix slow tests

8dcb4fb

fix

8cfabfe

zucchini-nlp requested review from gante and ArthurZucker and removed request for gante October 15, 2024 12:50

uncomment this

f0aff4f

ArthurZucker reviewed Oct 16, 2024

View reviewed changes

zucchini-nlp added 5 commits October 21, 2024 12:27

merge main

87bfdb6

clean up after rebase

c39c5ed

should be model main input

37d25b1

fix overwritten tests

ce467a0

oops len should be multiple of frame number

95f6b76

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLIP: enable generation tests #34174

BLIP: enable generation tests #34174

zucchini-nlp commented Oct 15, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 15, 2024

gante commented Oct 16, 2024 •

edited

Loading

zucchini-nlp commented Oct 16, 2024 •

edited

Loading

ArthurZucker left a comment

gante commented Oct 17, 2024

zucchini-nlp commented Oct 21, 2024

BLIP: enable generation tests #34174

Are you sure you want to change the base?

BLIP: enable generation tests #34174

Conversation

zucchini-nlp commented Oct 15, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Oct 15, 2024

gante commented Oct 16, 2024 • edited Loading

zucchini-nlp commented Oct 16, 2024 • edited Loading

ArthurZucker left a comment

Choose a reason for hiding this comment

gante commented Oct 17, 2024

zucchini-nlp commented Oct 21, 2024

zucchini-nlp commented Oct 15, 2024 •

edited

Loading

gante commented Oct 16, 2024 •

edited

Loading

zucchini-nlp commented Oct 16, 2024 •

edited

Loading