Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CI for VLMs #35690

Merged
merged 5 commits into from
Jan 20, 2025
Merged

Fix CI for VLMs #35690

merged 5 commits into from
Jan 20, 2025

Conversation

zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Jan 14, 2025

What does this PR do?

As per title, main changes are:

  • Removed some tests, not needed anymore as they tested old merging behavior in VLMs
  • Removed logit level check as those break a lot because of precision errors when we update torch for ex. Test on token-level generation should be enough
  • Other tests actually needed to update their expected_text so updated them
  • Since it was a tiny change, propagated Emu3 checkpoint name update to docs (can move to a new PR to make it cleaner)
  • Mllama tests are failing because of gated repo (meta-llama/Llama-3.2-11B-Vision) cc @ydshieh
  • Emu3 also needs a bit more GPU memory to run (around 16GB), that is the minimum possible value if we get a tiny image as input. So I don't know if we should just remove batched slow tests cc @ydshieh

Note: Emu3/Aria flex-attn test will be failing and needs a different PR to fix the way attention is dispatched on some VLMs. Apparently it never worked for Emu3-like models

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp requested a review from ydshieh January 14, 2025 14:16
@ydshieh
Copy link
Collaborator

ydshieh commented Jan 14, 2025

Mllama tests are failing because of gated repo (meta-llama/Llama-3.2-11B-Vision)

Asked the infrao. It can't be downloaded in EU region ...

Emu3 also needs a bit more GPU memory to run (around 16GB), that is the minimum possible value if we get a tiny image as input. So I don't know if we should just remove batched slow tests

which (full) test name? @Cyrilvallez talked to me we might need to introduce @need_a10_gpu decorator if we don't have other way

Comment on lines +892 to +896
[[0.9148, -1.4148, 3.8040], [3.3443, 1.9478, 0.2080], [1.6604, 2.8184, -0.3618]]
).to(torch_device)

self.assertTrue(
torch.allclose(outputs.vision_model_output.last_hidden_state[0, :3, :3], expected_slice, atol=1e-4)
torch.allclose(outputs.vision_model_output.last_hidden_state[0, :3, :3], expected_slice, atol=1e-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somehow I feel such changes are quite large, and not really know what happened in the history of commits ...
But first question: are you checking this on T4 runner?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, running everything on T4. I believe the user who added the test didn't trigger slow tests with our runners thus the difference

The test fails when I go to the commit when it was added

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I am the one working on Kosmos2 and I ran the test like hundred of times 😆 .
Maybe torch versions changes matter here. Anyway, I can take a look on this specific model and make sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this particular test was added by community user, after model release. But yeah, double checking would not hurt

@zucchini-nlp
Copy link
Member Author

Asked the infrao. It can't be downloaded in EU region ...

😢

which (full) test name? @Cyrilvallez talked to me we might need to introduce @need_a10_gpu decorator if we don't have other way

Cool, the decorator wouuld definitely help a lot! The tests are test_model_generation_batched and test_model_generation_multi_image

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM appart from tests that are a bit of an edge case, even if fixed!

@@ -481,49 +480,6 @@ def test_batched_generation(self):
outputs = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
self.assertEqual(outputs, EXPECTED_OUTPUT)

@slow
@require_bitsandbytes
def test_llava_index_error_bug(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we are removing this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original bug was related to extending the attention mask after we merge_inputs the old way, and the bug cannot be there anymore as we removed old style merging

@Cyrilvallez
Copy link
Member

See the decorator @require_torch_large_gpu if needed! Just merged it

@zucchini-nlp zucchini-nlp merged commit 8571bb1 into huggingface:main Jan 20, 2025
16 checks passed
bursteratom pushed a commit to bursteratom/transformers that referenced this pull request Jan 31, 2025
* fix some easy test

* more tests

* remove logit check here also

* add require_torch_large_gpu in Emu3
elvircrn pushed a commit to elvircrn/transformers that referenced this pull request Feb 13, 2025
* fix some easy test

* more tests

* remove logit check here also

* add require_torch_large_gpu in Emu3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants