Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cluster] add process group mesh #4038

Closed
ver217 opened this issue Jun 19, 2023 · 1 comment · Fixed by #4039
Closed

[cluster] add process group mesh #4038

ver217 opened this issue Jun 19, 2023 · 1 comment · Fixed by #4039
Assignees
Labels
enhancement New feature or request

Comments

@ver217
Copy link
Member

ver217 commented Jun 19, 2023

Motivation

We have three main components which related to process group initialization:

  • Global parallel context
  • Device mesh
  • Process group manager

Global parallel context is compatible with all kinds of famous parallelism, but it has below drawbacks:

  • It's global, which means it's not flexible enough
  • It's deeply coupled with parallel method, which means it's not easy to extend
  • Some namings are confusing, e.g. local_rank

Device mesh it to decribe how a tensor is stored. It's great for tensor parallelism, but not for other parallelism.

Process group manager is too simple, which is just a dict of process groups, to handle complex ND-parallelism scenario.

In conclusion, we need a component which is:

  • Totally decoupled with parallel method
  • Not global
  • Easy to handle complex ND-parallism

Process group mesh

Process group mesh is to describe how to organize process groups. It's not coupled with parallel method. However, through it, it's easy to initialize process groups in ND-parallelism scenario.

It's a helper/utility class. It just initializes process groups and cache them. Exact parallel method will mange them.

We can use a ND-tuple to describe a process group mesh. E.g. ProcessGroupMesh(2, 2, 2) means a 3D cube process group mesh. We can further use a ND-coordinate to describe each process. E.g. (0, 1, 0) means the process whose rank is 2 in the above process group mesh. In classic 3D-parallelim scenario, each parallel method takes an axis. E.g. data parallelism takes axis-0, pipeline parallelism takes axis-1 and tensor parallelism takes axis-2. Process group mesh will provide a method to create group along axis, thus, it's easy to handle 3D-parallism.

@ver217 ver217 self-assigned this Jun 19, 2023
@ver217 ver217 added the enhancement New feature or request label Jun 19, 2023
@ver217
Copy link
Member Author

ver217 commented Jun 28, 2023

Completed in #4039

@ver217 ver217 closed this as completed Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant