Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sequence parallel for uneven heads #6392

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

inkcherry
Copy link
Contributor

@inkcherry inkcherry commented Aug 21, 2024

In sequence_parallel (Ulysses), the sequence parallel size is constrained by the requirement to be divisible by the number of heads, which prevents some models/workloads from setting a specific sequence parallel size. This PR implements uneven all-to-all heads splitting.

  • both support batch first (b,s,...) and seq_len first(s,b..) layout.
  • Added unit tests with numerical checks. Locally also tested with 7 heads with sp=4 and 20 heads with sp=8, and it passed.

deepspeed/sequence/layer.py Outdated Show resolved Hide resolved
deepspeed/sequence/layer.py Outdated Show resolved Hide resolved
@delock
Copy link
Collaborator

delock commented Aug 30, 2024

@inkcherry thanks, I have no further questions. Hi @tjruwase @loadams this PR is to enable sequence parallel for model with number of heads not power of two, which is requested from customer. Can this PR be reviewed? Thanks!

@loadams
Copy link
Contributor

loadams commented Sep 3, 2024

@inkcherry thanks, I have no further questions. Hi @tjruwase @loadams this PR is to enable sequence parallel for model with number of heads not power of two, which is requested from customer. Can this PR be reviewed? Thanks!

Thanks @delock, things look good now, but we will just need to get a review.

@tjruwase tjruwase requested a review from tohtana as a code owner October 8, 2024 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants