Sequence parallelism #45

hatanp · 2024-07-11T19:20:22Z

Currently it seems like both Megatron SP and DeepSpeed SP are not correctly implemented in Megatron-DeepSpeed. Maybe this was working once but since new features have been added there are conflicts between the two and for example flags that were once checking for megatron-SP were actually implemented to check for deepspeed SP. Sometimes these for example collect the wrong dimension like DeepSpeed SP does. Importantly the SP should also work with TP and PP to be useful for large scale training.

A ported Megatron-LM from 10/23 implements an SP succesfully but lacks some features such as some related to MoE, mendioned in issue #44

Eugene29 · 2024-08-30T18:44:03Z

Hi, the source of SP hanging seems to be related to this commit. With everything else held constant, the commits before works, but the ones after hangs.

hatanp · 2024-08-30T18:59:07Z

That is a separate known issue. There is a barrier currently only tensor parallel rank 0 joins and the fix is relatively easy but not yet implemented.

hatanp self-assigned this Jul 11, 2024

hatanp mentioned this issue Jul 11, 2024

Feature: Multimodality and ViT #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence parallelism #45

Sequence parallelism #45

hatanp commented Jul 11, 2024

Eugene29 commented Aug 30, 2024

hatanp commented Aug 30, 2024

Sequence parallelism #45

Sequence parallelism #45

Comments

hatanp commented Jul 11, 2024

Eugene29 commented Aug 30, 2024

hatanp commented Aug 30, 2024