Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for manipulating token-level input embeddings #4537

Closed
4 tasks done
ringohoffman opened this issue Dec 19, 2023 · 3 comments
Closed
4 tasks done

API for manipulating token-level input embeddings #4537

ringohoffman opened this issue Dec 19, 2023 · 3 comments
Labels
enhancement New feature or request stale

Comments

@ringohoffman
Copy link

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

API to allow manipulation of token-level input embeddings.

Motivation

I work for Protopia AI, a company that offers an LLM privacy solution that works by transforming LLM input embeddings. We have a client that uses llama.cpp, and we are interested to see how our PyTorch based solution can integrate with llama.cpp. I have been looking at llama-cpp-python as an avenue to understand llama.cpp's APIs.

I've tried to understand the current embedding API with little luck:

llama-cpp-python discussion: Embedding for sentence
llama-cpp-python discussion: Load 70b model only once -- for embedding and for completion

llama.cpp discussion: What exactly does llama_get_embeddings return?
llama.cpp issue: Bug: Invalid Embeddings if GPU offloaded (CUDA)
llama.cpp discussion: Where does the embedding come from?
llama.cpp discussion: How to use embedding ?
llama.cpp discussion: How do I get input embeddings?
llama.cpp discussion: How to get and modify weights between activated neurons?

How much work would it be to expose access to the token-level input embeddings for manipulation at inference? Is this possible using llama.cpp's existing APIs?

@ringohoffman ringohoffman added the enhancement New feature or request label Dec 19, 2023
@abb128
Copy link

abb128 commented Dec 20, 2023

This was very confusing to me initially, but I found the following (ugly) method to work

You can extract the full input embed matrix like so

    std::vector<float> embeddings;
    embeddings.resize(llama_n_embd(model) * llama_n_vocab(model));

    auto tensor = llama_get_model_tensor(model, "token_embd.weight");
    ASSERT(tensor);

    if(tensor->type != GGML_TYPE_F32) {
        ggml_internal_get_type_traits(tensor->type).to_float(tensor->data,
                                                             embeddings.data(),
                                                             embeddings.size());
    } else {
        ASSERT((tensor->ne[0] * tensor->ne[1]) == embeddings.size());
        memcpy(embeddings.data(), tensor->data, embeddings.size() * sizeof(float));
    }

You can then convert token id to embeds like so

                    const float *embeds = embeddings.data() +
                                 (t.token * n_embd);

Then use the embeds field of llama_batch to input this embedding or a modified embedding based on this, in my experience I think you can only decode one token at a time, this may have changed since I last updated

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 18, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

2 participants