-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed save_imatrix to match old behaviour for MoE #7099
Merged
Merged
Changes from 2 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
8ac51ca
Fixed save_imatrix to match old behaviour for MoE
jukofyork 5e98f44
Fixed missing idx variable
jukofyork c6d2bbb
Unconditionally increment ncall
jukofyork 6674e79
Fixed 2 bugs in save_imatrix()
jukofyork cb3d9ba
ncall needs summing too
jukofyork 6e05492
Trailing whitespace
jukofyork File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past only one expert was evaluated per mul_mat_id, and
op_params
was used to store the expert being evaluated, but that's no longer the case.op_params
is not used anymore in mul_mat_id, so this condition doesn't really do anything,op_params
will always be zero so it's always true.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't test it but the old
if (idx == 0)
did work.What test can be done to test for the callback being the last for the MoE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in effect, this change does nothing,
e.ncall
is increased unconditionally as it was before. I think that increasingncall
unconditionally here is the correct thing to do, since the count is later corrected insave_imatrix
with your change.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently there is only one call to
mul_mat_id
regardless of the number of experts being used. This was changed in #6505.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it will still work:
(p.second.values[i] / static_cast<float>(p.second.counts[i])) * static_cast<float>(p.second.ncall)
Basically we divide down to get the actual mean based on how many actual values were added to an element
values[i]
and then multiply back up to get a value that can be used for the weighted combination of other imatrix files and forquantize
to get back the original mean when it divides byncall
stored in the file.So having the weighted combination scaled up by
num-top-k
won't effect either of these.But this will still cause it to save the 10 chunks too often:
vs
and with the debug output on in
quantize
, will printnum-top-k
times more for eachncall
for the experts:vs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly, you based these changes on a build before #6505 was merged, and the results that show a higher number of
ncall
for the moe tensors is with a build without #6505, correct?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this PR with
ncall
increased unconditionally with mixtral and it seems to produce the expected results:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I used the build right after the
dbrx
PR was pushed as originally went on this search after having lots of trouble quantizing it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it looks like that can just be left unconditional then.
It's probably worth trying to re-quantize mixtral with and without these fixes too, just in case something else has changed since then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will give it a try with a low number of chunks, but I don't have enough VRAM to create a imatrix for mixtral with the full
wiki.train.raw
in a reasonable amount of time.