You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Based on the original implementation for deepseek-v3, they use top-2 and sum instead of max to choose topk groups.
However, DeepSeek-V2 technical report and implementation suggest to use max so giving options (between two) would be the solution.
The text was updated successfully, but these errors were encountered:
bzantium
changed the title
[ENHANCEMENT] use sum instead of max for device_limited_topk
[ENHANCEMENT] add options how to choose topk devices for device_limited_topkFeb 6, 2025
Is your feature request related to a problem? Please describe.
Based on the original implementation for deepseek-v3, they use
top-2 and sum
instead ofmax
to choose topk groups.but in Megatron-LM, it only uses
max
like:However, DeepSeek-V2 technical report and implementation suggest to use max so giving options (between two) would be the solution.
The text was updated successfully, but these errors were encountered: