Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable compatibility of sparse MOE training under DeepSpeed ZeRO3 #312

Closed
wants to merge 1 commit into from

Conversation

Jacky-hate
Copy link

According to deepspeedai/DeepSpeed#4966, ZeRO3 in DeepSpeed does not work with MoE models because the order of executing modules can change at every forward/backward pass and a new API is implemented to stop breaking down a module for parameter fetching. Similar case occurs when finetuning Qwen1.5-MoE-A2.7B using ZeRO3 optimization(#275).

This PR use the api above to make sparse MoE layer compatible with Zero3.

@JustinLin610 JustinLin610 self-requested a review April 26, 2024 09:40
@jklj077
Copy link
Collaborator

jklj077 commented Aug 19, 2024

Hi, thanks for your contribution! It is a very nice addition.

Unfortunately, we are in the process of deprecating the examples/sft/finetune.py script and the related due to reported compatibility issues. As our code is adapted from https://github.com/lm-sys/FastChat/tree/main/fastchat/train, I suppose the same issue also exists there and they may still be open to PR on the training code.

thanks again for the support.

@jklj077 jklj077 closed this Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants