Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for text-completion and text-embedding APIs #239

Open
AsakusaRinne opened this issue Nov 3, 2023 · 3 comments
Open

Improve support for text-completion and text-embedding APIs #239

AsakusaRinne opened this issue Nov 3, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@AsakusaRinne
Copy link
Collaborator

As illustrated here, text completion and text embedding are important for many scenes. Currently LLamaSharp prefer to chat mode but it's more convenient to integrate with unity and other libraries such as semantic-kernel. Though StatelessExecutor has a similar support, we should consider if there's a better design for text-completion and text-embedding APIs.

I'll try to work on this issue and welcome to add your suggestion/proposal here. :)

@AsakusaRinne AsakusaRinne added the enhancement New feature or request label Nov 3, 2023
@AsakusaRinne AsakusaRinne self-assigned this Nov 3, 2023
@Xsanf
Copy link

Xsanf commented Nov 3, 2023

I tried the demo for Unity, for version 0.7. I'm delighted, it's immediately clear that it was done by a professional, unlike me.))

I'll just repeat myself. In Unity, the main part is not the chat. There you deal with agents. You simply have no one to talk to, since chat is hardly predictable in terms of the development of any scenario. You move along predetermined points and the LLM’s task is to fill the point with non-trivial content. What the user communicates with has goals and some kind of scenario. Even if the script and points are generated during the interaction.

Each object on the stage has a text description available for context and its own history of interactions. The stage space is divided into zones, each of which also has its own context. Each character has its own description and its own history of interactions. The entire game has a script and a constantly evolving and refined plan of action and goals. The characters are not just present on stage. They have a visual angle and attention to their surroundings.

The character does not converse as in a chat with the player or other characters. He follows a chain of instructions that require him to analyze the situation, adopt a plan of current action, and react to what he observes.
He does not respond to a comment as a chat character, he follows an instruction that asks him - ### Instruction: How would you argue|protest|support|demand for {Char3} the need for {Action} to achieve the goal {Goal2}, taking into account the characters' personalities and their current relationship.
Or even keep a log of your internal monologue (thoughts).

This implies multiple requests to form meaningful interactions with the environment. In addition, this requires a complex mechanism for creating a primary context collected from different sources and possibly reformulating it and extracting theses from it, which will either be used to plan the agent’s behavior or to separate them into different histories for storage.

I described all this to make it clear that all this has little to do with chat.
But all this requires only a few functions. Inference, for executing instructions with the collected context. A vector database of texts or a function for calculating a semantic vector and a function for assessing the proximity of vectors. You will either use the base or write a context assembly algorithm yourself, based on functions.

All! The rest is convenience or the use of LoRA, which will improve the execution of individual instructions (a special decision system). It is advisable to switch LoRA without rebooting the model. Because it is often easier to get a good result for solving a specific problem by training a special set of instructions than a general one.

The priority is to minimize resources and conveniently manage them, requiring a minimum of reboots. It is necessary that saving states and changing modes does not involve the extremely costly reboot of the model.

I often mention that in addition to the vector base of the text, we need separate functions for calculating the semantic vector and calculating the proximity of vectors. This is due to the fact that the vector database usually cuts the text into overlapping slices. Moreover, depending on the nature of the text, both the size of the slices and the overlap greatly influence the efficiency. This is a good compromise for very large associative memories.

With a complex system for creating a limited context from a large number of short sources, it is more convenient to build repositories (often temporary lists) with full short texts, not limited to specific slice sizes, but allowing associative search.
It is clear that you can put not only a slice, but also the full text into a vector database, but often such work will be less efficient and transparent than directly using functions. Direct functions will simplify algorithms for using and modifying such temporary lists. Temporary list mode for the database will cause an increasing need to clean and condense the database to maintain its performance.

That's why I keep mentioning this possibility. This is in addition to other applications. For example, choosing one of several generated answers that is most relevant to a given goal.

These are the general considerations that would like to see implemented for Unity.

@xbotter
Copy link
Collaborator

xbotter commented Feb 18, 2024

The llama.cpp has been enhanced to support the BERT model, allowing the use of BERT embedding models.

ggerganov/llama.cpp#5423

@martindevans
Copy link
Member

I also recently pushed a PR which properly L2 normalises the embeddings, which should vastly improve quality when long sentences are being embedded!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 📋 TODO
Development

No branches or pull requests

4 participants