Consistency of input embedding vectors when extracted with different methods #9015
Replies: 2 comments 2 replies
-
Tensors in the computation graph may be overwritten by later operations. To avoid this, you can use |
Beta Was this translation helpful? Give feedback.
-
@slaren @ggerganov I’ve tried adding ggml_set_output(embeddings), but it doesn’t change the output of the embedding vector.
I will try @ggerganov 's suggestion of eval-callback tonight. I think that should give me the ground truth I am looking for, and then I will be able to inspect the code to see how it's done, and incorporate the correct method into my code. |
Beta Was this translation helpful? Give feedback.
-
Thank you for making llama.cpp available, it is amazing software.
I have been trying to extract internal representations (i.e. normalized embedding vector in the second layer), to use for training a new neural network. But when I ran it in inference, I got inconsistent results. I eventually traced the problem to getting different normalized embedding vectors in the second layer for the same input tokens, depending on whether I ran the inference in batch mode or streaming mode. I am using llama.cpp server, and the model is llama-2-7b-chat, Q6_K.
I figured that I must be extracting the embedding vectors in the second layer incorrectly, so I tried extracting the embedding vectors in the first layer (not normalized), and I found the same inconsistency. Since the first layer embedding vectors come from calling ggml_get_rows() on the input tokenIDs, I tried extracting the embedding vectors that way (thinking they should be ground truth), and they did not agree with any of the other results!
Here is the code I used in llama.cpp to extract the embedding vectors:
I also tried several other methods, as below, but they all agreed with the above method:
I used this method to directly read the rows of the embedding vector lookup table tok_embd:
Prompt:
The Ministry of Finance has fabricated and published the price index for the sale of the Republic of China on Taiwan and the price doubling table for asset revaluation over the years.
The first three tokens are:
And here are different results for the input embedding vectors:
I also tried extracting the embedding vectors from the Pytorch unquantized implementation of llama-2-7b-chat, and got these further different results:
So my question is: Is there a reason why these input embedding vectors should be different, for the same input tokenIDs?
Is one of them right? Or are all of my attempts wrong? What am I doing wrong?
Thank you in advance for any insights on this problem.
Lloyd
Beta Was this translation helpful? Give feedback.
All reactions