Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLIP: enable generation tests #34174

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Oct 15, 2024

What does this PR do?

Enables generation tests for BLIP models, except BLIP-1 (turned out to be a bit harder). I changed the generation tests to use modelTest.input_name as BLIP is the only model that uses pixel values as main input and thus checking generated text length's will always fail.

I tried to get rid of custom generate for these models, but that opened a Pandora box so I think better not waste time on an old model and maintain it for a while, until the model gets deprecated. But still I did some changes so we don't need to add extra bos at the beginning and now the decoder-based BLIP models return full text at output. Encoder-decoder based models return only generated text, which is consistent with what an LLM should do

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp requested review from gante and ArthurZucker and removed request for gante October 15, 2024 12:50
@gante
Copy link
Member

gante commented Oct 16, 2024

changed the generation tests to use modelTest.input_name as BLIP is the only model that uses pixel values as main input and thus checking generated text length's will always fail.

I'd like very much to avoid this change -- extra logic for all tests to handle a niche corner case. Let's brainstorm alternatives! main_input in the generative tests is used to check the shapes. Perhaps we always want to look for input_ids or inputs_embeds in the input dictionary? 🤔

@zucchini-nlp
Copy link
Member Author

zucchini-nlp commented Oct 16, 2024

@gante I tried to force input_ids always and I found another corner case with Whisper which expects input_features to be the main input for shape checking. I would very much love to make BLIP standard and maybe I'll make so in v5, because it will break a whole lot of things.

OMG, I found an option while writing this reply, whisper and the other audio model are encoder-decoder so we can make it work by getting main input in decoder-only models. Just before the check happens, in the same indent block :)
If you agree, I'll make the change hehe

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AH actually we might need / want to force return_dict to TRUE, to avoid all the if elses

@gante
Copy link
Member

gante commented Oct 17, 2024

OMG, I found an option while writing this reply, whisper and the other audio model are encoder-decoder so we can make it work by getting main input in decoder-only models. Just before the check happens, in the same indent block :)
If you agree, I'll make the change hehe

if it works, sounds good! (make sure to leave a comment)

@zucchini-nlp
Copy link
Member Author

@gante requesting re-review, since the input-name was merged as a separate PR. I rebased main and ran tests again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants