-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fp16 support of Qwen1.5MoE models (A2.7B) to DeepSpeed-FastGen #5403
Conversation
shared_expert_output = self.shared_expert_mlp_2(shared_expert_output, cur_params.shared_moe_mlp_2, b=None) | ||
shared_expert_gate_output = self.shared_expert_gate(hidden_states, cur_params.shared_moe_gate, b=None)[..., :1] | ||
# shared_expert_gate_output shape[-1] is 1 | ||
shared_expert_output.mul_(torch.sigmoid(shared_expert_gate_output)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if using torch.sigmoid directly will affect performance?
When I use your source code to build DeepSpeed,and run “for mii pipeline” code, the process is blocked and no error。How should I identify the problem? I use 4090 GPU and transformer is 4.41.0.dev0, torch is 2.2.1, cuda version is 11.8. |
Hi, @heiseon I just created a new conda environment and built it from my deepspeed code and deepspeed-mii offical source code, and it’s ok without any issues. maybe you can delete my path is |
delete |
Hi, @heiseon It does not support quantized Qwen models currently. Supporting this would likely require a big effort, so it might not be considered in the short term. |
This PR adds support for Qwen1.5MoE-A2.7B models.
support for deepspeedai/DeepSpeed-MII#457
Test Code
for mii pipeline:
for huggingface:
Qwen1.5-MoE-A2.7B
Huggingface output with prompt "DeepSpeed is":
DeepSpeed-FastGen output with prompt "DeepSpeed is":
DeepSpeed-FastGen output with prompt "DeepSpeed is" with 8-way sharding: