You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I reviewed the Discussions, and have a new bug or useful enhancement to share.
Feature Description
API to allow manipulation of token-level input embeddings.
Motivation
I work for Protopia AI, a company that offers an LLM privacy solution that works by transforming LLM input embeddings. We have a client that uses llama.cpp, and we are interested to see how our PyTorch based solution can integrate with llama.cpp. I have been looking at llama-cpp-python as an avenue to understand llama.cpp's APIs.
I've tried to understand the current embedding API with little luck:
How much work would it be to expose access to the token-level input embeddings for manipulation at inference? Is this possible using llama.cpp's existing APIs?
The text was updated successfully, but these errors were encountered:
Then use the embeds field of llama_batch to input this embedding or a modified embedding based on this, in my experience I think you can only decode one token at a time, this may have changed since I last updated
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Feature Description
API to allow manipulation of token-level input embeddings.
Motivation
I work for Protopia AI, a company that offers an LLM privacy solution that works by transforming LLM input embeddings. We have a client that uses llama.cpp, and we are interested to see how our PyTorch based solution can integrate with llama.cpp. I have been looking at llama-cpp-python as an avenue to understand llama.cpp's APIs.
I've tried to understand the current embedding API with little luck:
llama-cpp-python discussion: Embedding for sentence
llama-cpp-python discussion: Load 70b model only once -- for embedding and for completion
llama.cpp discussion: What exactly does llama_get_embeddings return?
llama.cpp issue: Bug: Invalid Embeddings if GPU offloaded (CUDA)
llama.cpp discussion: Where does the embedding come from?
llama.cpp discussion: How to use embedding ?
llama.cpp discussion: How do I get input embeddings?
llama.cpp discussion: How to get and modify weights between activated neurons?
How much work would it be to expose access to the token-level input embeddings for manipulation at inference? Is this possible using llama.cpp's existing APIs?
The text was updated successfully, but these errors were encountered: