fix mask-tying to sequence length #660

erip · 2023-01-30T15:44:02Z

What does this PR do?

Fixes #655

This PR allows a class-level causal attention mask for fixed sequence-length tasks; otherwise it creates a new mask on-the-fly in the forward pass to disentangle the mask from sequence length which can vary between batches in tasks like MT.

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

erip · 2023-01-30T15:45:48Z

This is mostly a quick smoke test (haven't tested locally yet --- running into issues with local dev). I think this needs a unit test and probably an inspection of benchmark performance to ensure the new branching doesn't impact speed.

erip · 2023-01-30T17:40:44Z

I think this is ready for review. It's a pretty small change, benchmarks look OK, and tests pass (locally). cc @danthe3rd @blefaudeux

danthe3rd · 2023-01-30T18:07:57Z

Hi @erip and thanks for your contribution :)
I'm not familiar with this code - maybe @blefaudeux can have a look, or @fmassa ?

tests/test_attentions.py

xformers/components/attention/scaled_dot_product.py

blefaudeux

LGTM, I was probably the one writing the stateful masking, but thinking a bit more it's not obvious that it's a win. Deferring to @fmassa if something is wrong here, but out of principle looks good to me ! Thanks @erip

fmassa

LGTM, thanks!

As discussed in the previous comments, we really need to do some cleanup and refactoring on those APIs. But this is already a good improvement, thanks!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 30, 2023

fix mask-tying to sequence length. WIP

c29e380

erip force-pushed the hotfix/attention-mask branch from a888d65 to c29e380 Compare January 30, 2023 15:49

fix unbound variable.

735b617

erip force-pushed the hotfix/attention-mask branch from 1243c74 to 735b617 Compare January 30, 2023 16:24

add test to ensure different seq-length path works.

f149473

erip changed the title ~~[WIP] fix mask-tying to sequence length~~ fix mask-tying to sequence length Jan 30, 2023

danthe3rd requested review from blefaudeux and fmassa January 30, 2023 18:07

blefaudeux reviewed Jan 31, 2023

View reviewed changes

tests/test_attentions.py Show resolved Hide resolved

blefaudeux reviewed Jan 31, 2023

View reviewed changes

xformers/components/attention/scaled_dot_product.py Show resolved Hide resolved

blefaudeux approved these changes Jan 31, 2023

View reviewed changes

fmassa approved these changes Feb 3, 2023

View reviewed changes

fmassa merged commit 7f4fdce into facebookresearch:main Feb 3, 2023

erip deleted the hotfix/attention-mask branch February 3, 2023 12:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix mask-tying to sequence length #660

fix mask-tying to sequence length #660

erip commented Jan 30, 2023 •

edited

Loading

erip commented Jan 30, 2023 •

edited

Loading

erip commented Jan 30, 2023

danthe3rd commented Jan 30, 2023

blefaudeux left a comment

fmassa left a comment

fix mask-tying to sequence length #660

fix mask-tying to sequence length #660

Conversation

erip commented Jan 30, 2023 • edited Loading

What does this PR do?

Before submitting

PR review

erip commented Jan 30, 2023 • edited Loading

erip commented Jan 30, 2023

danthe3rd commented Jan 30, 2023

blefaudeux left a comment

Choose a reason for hiding this comment

fmassa left a comment

Choose a reason for hiding this comment

erip commented Jan 30, 2023 •

edited

Loading

erip commented Jan 30, 2023 •

edited

Loading