-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unbreak persimmon after #3837 #4010
Conversation
I honestly have no idea, my guess would be it was a mistake in the original implementation (#3410) that was somehow cancelled by another bug. The idea behind this permutation is that you first change dimensions because it's needed for concat operation. Then you want to "unpermute" to get the original dimensions "order" back. In any case persimmon works after this modification, but it's crashing without it ;) I'll look into it more tomorrow, thanks for catching this. |
There is indeed another permute that is cancelling this one: Line 3460 in a75fa57
However, to restore the original behavior, shouldn't the permute now be |
This is how I've viewed this:
We want the second permute before The snippet you linked seems to be the exact problem as to why it used to work, but doesn't anymore. This is an extra permutation that wasn't present before. struct ggml_tensor * Q = ggml_cont(ctx0, ggml_permute(ctx0, Qcur, 1, 2, 0, 3)); Q tensor, old implementation
Q tensor, new implementation (without fix)
Q tensor, new implementation (with fix)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the explanation.
This comment was marked as outdated.
This comment was marked as outdated.
Full set of the according quantized persimmon base and chat models is now available at However, these persimmon models do not work at all with Cuda acceleration. They do work with |
Permutation was incorrect ;)
Fixes issue described in #3837 (comment)