-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve support for text-completion and text-embedding APIs #239
Comments
I tried the demo for Unity, for version 0.7. I'm delighted, it's immediately clear that it was done by a professional, unlike me.)) I'll just repeat myself. In Unity, the main part is not the chat. There you deal with agents. You simply have no one to talk to, since chat is hardly predictable in terms of the development of any scenario. You move along predetermined points and the LLM’s task is to fill the point with non-trivial content. What the user communicates with has goals and some kind of scenario. Even if the script and points are generated during the interaction. Each object on the stage has a text description available for context and its own history of interactions. The stage space is divided into zones, each of which also has its own context. Each character has its own description and its own history of interactions. The entire game has a script and a constantly evolving and refined plan of action and goals. The characters are not just present on stage. They have a visual angle and attention to their surroundings. The character does not converse as in a chat with the player or other characters. He follows a chain of instructions that require him to analyze the situation, adopt a plan of current action, and react to what he observes. This implies multiple requests to form meaningful interactions with the environment. In addition, this requires a complex mechanism for creating a primary context collected from different sources and possibly reformulating it and extracting theses from it, which will either be used to plan the agent’s behavior or to separate them into different histories for storage. I described all this to make it clear that all this has little to do with chat. All! The rest is convenience or the use of LoRA, which will improve the execution of individual instructions (a special decision system). It is advisable to switch LoRA without rebooting the model. Because it is often easier to get a good result for solving a specific problem by training a special set of instructions than a general one. The priority is to minimize resources and conveniently manage them, requiring a minimum of reboots. It is necessary that saving states and changing modes does not involve the extremely costly reboot of the model. I often mention that in addition to the vector base of the text, we need separate functions for calculating the semantic vector and calculating the proximity of vectors. This is due to the fact that the vector database usually cuts the text into overlapping slices. Moreover, depending on the nature of the text, both the size of the slices and the overlap greatly influence the efficiency. This is a good compromise for very large associative memories. With a complex system for creating a limited context from a large number of short sources, it is more convenient to build repositories (often temporary lists) with full short texts, not limited to specific slice sizes, but allowing associative search. That's why I keep mentioning this possibility. This is in addition to other applications. For example, choosing one of several generated answers that is most relevant to a given goal. These are the general considerations that would like to see implemented for Unity. |
The llama.cpp has been enhanced to support the BERT model, allowing the use of BERT embedding models. |
I also recently pushed a PR which properly L2 normalises the embeddings, which should vastly improve quality when long sentences are being embedded! |
As illustrated here, text completion and text embedding are important for many scenes. Currently LLamaSharp prefer to chat mode but it's more convenient to integrate with unity and other libraries such as semantic-kernel. Though
StatelessExecutor
has a similar support, we should consider if there's a better design for text-completion and text-embedding APIs.I'll try to work on this issue and welcome to add your suggestion/proposal here. :)
The text was updated successfully, but these errors were encountered: