Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbreak persimmon after #3837 #4010

Merged
merged 1 commit into from
Nov 10, 2023
Merged

Conversation

Galunid
Copy link
Collaborator

@Galunid Galunid commented Nov 9, 2023

Permutation was incorrect ;)

Fixes issue described in #3837 (comment)

@Galunid Galunid marked this pull request as ready for review November 9, 2023 15:38
@slaren
Copy link
Collaborator

slaren commented Nov 9, 2023

How did #3837 break this? From what I can tell, the permute was already like this before that commit:

struct ggml_tensor * Q = ggml_cont(ctx0, ggml_permute(ctx0, Qcur, 1, 2, 0, 3));

@Galunid
Copy link
Collaborator Author

Galunid commented Nov 9, 2023

I honestly have no idea, my guess would be it was a mistake in the original implementation (#3410) that was somehow cancelled by another bug.

The idea behind this permutation is that you first change dimensions because it's needed for concat operation. Then you want to "unpermute" to get the original dimensions "order" back.

In any case persimmon works after this modification, but it's crashing without it ;)

I'll look into it more tomorrow, thanks for catching this.

@slaren
Copy link
Collaborator

slaren commented Nov 9, 2023

There is indeed another permute that is cancelling this one:

struct ggml_tensor * q = ggml_permute(ctx, q_cur, 0, 2, 1, 3);

However, to restore the original behavior, shouldn't the permute now be 1,0,2,3? So that after the permute 0,2,1,3 in llm_build_kqv it becomes 1,2,0,3 like before.

@Galunid
Copy link
Collaborator Author

Galunid commented Nov 10, 2023

This is how I've viewed this:

First permute:
start   -> a b c d
permute -> 0 2 1 3
res     -> a c b d <- c, b swapped

Second permute
start   -> a c b d <- note different order of c, b
permute -> 0 2 1 3
res     -> a b c d <- original order

We want the second permute before llm_build_kqv, since it expects "original" dimensions, for example llama does no permutations and still directly passes QCur to llm_build_kqv.

The snippet you linked seems to be the exact problem as to why it used to work, but doesn't anymore. This is an extra permutation that wasn't present before.

 struct ggml_tensor * Q = ggml_cont(ctx0, ggml_permute(ctx0, Qcur, 1, 2, 0, 3));

Q tensor, old implementation

start   -> a b c d
permute -> 2 1 0 3
res     -> c b a d

start   -> c b a d
permute -> 1 2 0 3
res     -> a c b d

mul_mat(k, q), k tensor is a b c d

Q tensor, new implementation (without fix)

start   -> a b c d
permute -> 2 1 0 3
res     -> c b a d

start   -> c b a d
permute -> 1 2 0 3
res     -> a c b d  <- this are dims we expect for mul_mat

start   -> c b a d  <- this is llm_build_kqv permutation
permute -> 0 2 1 3
res     -> c a b d

mul_mat(k, q), k tensor is a b c d <- can't mul_mat, wrong dimensions

Q tensor, new implementation (with fix)

start   -> a b c d
permute -> 2 1 0 3
res     -> c b a d

start   -> c b a d
permute -> 2 1 0 3
res     -> a b c d

start   -> a b c d  <- llm_build_kqv permutation, now working as expected
permute -> 0 2 1 3
res     -> a c b d  <- same as old implementation

mul_mat(k, q), k tensor is a b c d

Copy link
Collaborator

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the explanation.

@Galunid Galunid merged commit df9d129 into ggerganov:master Nov 10, 2023
@Galunid Galunid deleted the persimmon-fix branch November 10, 2023 13:24
@maddes8cht

This comment was marked as outdated.

@maddes8cht
Copy link
Contributor

Full set of the according quantized persimmon base and chat models is now available at
https://huggingface.co/maddes8cht/adept-persimmon-8b-chat-gguf
and
https://huggingface.co/maddes8cht/adept-persimmon-8b-base-gguf

However, these persimmon models do not work at all with Cuda acceleration. They do work with --n-gpu-layers 0 in the cublas compiled versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants