Enable compatibility of sparse MOE training under DeepSpeed ZeRO3 #312

Jacky-hate · 2024-04-16T04:12:08Z

According to deepspeedai/DeepSpeed#4966, ZeRO3 in DeepSpeed does not work with MoE models because the order of executing modules can change at every forward/backward pass and a new API is implemented to stop breaking down a module for parameter fetching. Similar case occurs when finetuning Qwen1.5-MoE-A2.7B using ZeRO3 optimization(#275).

This PR use the api above to make sparse MoE layer compatible with Zero3.

jklj077 · 2024-08-19T06:58:49Z

Hi, thanks for your contribution! It is a very nice addition.

Unfortunately, we are in the process of deprecating the examples/sft/finetune.py script and the related due to reported compatibility issues. As our code is adapted from https://github.com/lm-sys/FastChat/tree/main/fastchat/train, I suppose the same issue also exists there and they may still be open to PR on the training code.

thanks again for the support.

enable compatibility of sparse MOE training under DeepSpeed Zero3

ab33b7a

JustinLin610 self-requested a review April 26, 2024 09:40

jklj077 closed this Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable compatibility of sparse MOE training under DeepSpeed ZeRO3 #312

Enable compatibility of sparse MOE training under DeepSpeed ZeRO3 #312

Jacky-hate commented Apr 16, 2024

jklj077 commented Aug 19, 2024

Enable compatibility of sparse MOE training under DeepSpeed ZeRO3 #312

Enable compatibility of sparse MOE training under DeepSpeed ZeRO3 #312

Conversation

Jacky-hate commented Apr 16, 2024

jklj077 commented Aug 19, 2024