ModernBERT bug fixes #35404

warner-benjamin · 2024-12-23T20:20:44Z

This PR fixes a few issues with the ModernBERT implementation and some typos in the docs.

First, on the flash attention 2 path, ModernBERT was incorrectly always returning repadded logits without gradients (see #35386). This PR will return padded logits with gradients if no labels are passed to ModernBertForMaskedLM or if repad_logits_with_grad is set to true. We don't keep the gradient when repadding after using an internal loss function by default to save memory.

Second, this PR fixes torch.compile from being automatically set on when on CPU (#35388).

Third, I added some details to the model doc strings.

Fourth, documentation is updated to capitalize BERT in ModernBERT following the other BERT models.

cc @ArthurZucker @tomaarsen

tomaarsen · 2024-12-23T20:49:34Z

docs/source/en/_toctree.yml

@@ -503,7 +503,7 @@
      - local: model_doc/mobilebert
        title: MobileBERT
      - local: model_doc/modernbert
-        title: ModernBert
+        title: ModernBERT


Good call - I got carried away with the Python class naming

src/transformers/models/modernbert/modular_modernbert.py

tomaarsen · 2024-12-23T20:53:57Z

src/transformers/models/modernbert/modular_modernbert.py

        if config._attn_implementation_internal is None:
            config._attn_implementation_internal = "flash_attention_2"
            try:
                return cls._check_and_enable_flash_attn_2(
                    config,
-                    torch_dtype=torch_dtype,
+                    torch_dtype=torch.float16,


I think this is a cleaner solution to avoid the unnecessary FP32 warning than I figured was possible, nice.

HuggingFaceDocBuilderDev · 2024-12-23T21:29:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/transformers/models/modernbert/modular_modernbert.py

ArthurZucker

I'll let @tomaarsen update, I think we can just check if training and labels iwthout having a new arg

…aining I'm not 100% that the conditional with "or labels is None" makes sense though - not sure what the intention is there. Perhaps we can remove that?

src/transformers/models/modernbert/modular_modernbert.py

…to pr-35404

ArthurZucker

LGTM, thanks both!

… when training" This reverts commit cedcb4e.

warner-benjamin added 2 commits December 23, 2024 14:01

bug fixes

747b3e5

organize imports

46065d4

warner-benjamin mentioned this pull request Dec 23, 2024

modernbert logits do not have gradient #35386

Closed

4 tasks

tomaarsen reviewed Dec 23, 2024

View reviewed changes

src/transformers/models/modernbert/modular_modernbert.py Outdated Show resolved Hide resolved

tomaarsen reviewed Dec 23, 2024

View reviewed changes

wrap cpu warning in reference_compile

ab11657

tomaarsen approved these changes Dec 30, 2024

View reviewed changes

NohTow mentioned this pull request Jan 9, 2025

update modular_modernbert -- add inputs_embeds param to ModernBertModel #35373

Merged

tomaarsen reviewed Jan 9, 2025

View reviewed changes

src/transformers/models/modernbert/modular_modernbert.py Outdated Show resolved Hide resolved

ArthurZucker reviewed Jan 9, 2025

View reviewed changes

Avoid needing repad_logits_with_grad, always repad with grads when tr…

cedcb4e

…aining I'm not 100% that the conditional with "or labels is None" makes sense though - not sure what the intention is there. Perhaps we can remove that?

tomaarsen requested review from stevhliu and Rocketknight1 as code owners January 9, 2025 13:42

tomaarsen reviewed Jan 9, 2025

View reviewed changes

src/transformers/models/modernbert/modular_modernbert.py Outdated Show resolved Hide resolved

Merge branch 'main' of https://github.com/huggingface/transformers in…

8250c89

…to pr-35404

ArthurZucker approved these changes Jan 9, 2025

View reviewed changes

tomaarsen and others added 3 commits January 9, 2025 17:39

Revert "Avoid needing repad_logits_with_grad, always repad with grads…

656dd1f

… when training" This reverts commit cedcb4e.

Fix grammar: keep -> keeps

9aa38ef

Propagate grammar fix with modular_model_converter

41606c1

ArthurZucker approved these changes Jan 9, 2025

View reviewed changes

tomaarsen merged commit 1e3ddcb into huggingface:main Jan 9, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModernBERT bug fixes #35404

ModernBERT bug fixes #35404

warner-benjamin commented Dec 23, 2024 •

edited

Loading

tomaarsen Dec 23, 2024

tomaarsen Dec 23, 2024

HuggingFaceDocBuilderDev commented Dec 23, 2024

ArthurZucker left a comment

ArthurZucker left a comment

ModernBERT bug fixes #35404

ModernBERT bug fixes #35404

Conversation

warner-benjamin commented Dec 23, 2024 • edited Loading

tomaarsen Dec 23, 2024

Choose a reason for hiding this comment

tomaarsen Dec 23, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Dec 23, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

warner-benjamin commented Dec 23, 2024 •

edited

Loading