Improve support for text-completion and text-embedding APIs #239

AsakusaRinne · 2023-11-03T16:42:43Z

As illustrated here, text completion and text embedding are important for many scenes. Currently LLamaSharp prefer to chat mode but it's more convenient to integrate with unity and other libraries such as semantic-kernel. Though StatelessExecutor has a similar support, we should consider if there's a better design for text-completion and text-embedding APIs.

I'll try to work on this issue and welcome to add your suggestion/proposal here. :)

The text was updated successfully, but these errors were encountered:

Xsanf · 2023-11-03T19:49:18Z

I tried the demo for Unity, for version 0.7. I'm delighted, it's immediately clear that it was done by a professional, unlike me.))

I'll just repeat myself. In Unity, the main part is not the chat. There you deal with agents. You simply have no one to talk to, since chat is hardly predictable in terms of the development of any scenario. You move along predetermined points and the LLM’s task is to fill the point with non-trivial content. What the user communicates with has goals and some kind of scenario. Even if the script and points are generated during the interaction.

Each object on the stage has a text description available for context and its own history of interactions. The stage space is divided into zones, each of which also has its own context. Each character has its own description and its own history of interactions. The entire game has a script and a constantly evolving and refined plan of action and goals. The characters are not just present on stage. They have a visual angle and attention to their surroundings.

The character does not converse as in a chat with the player or other characters. He follows a chain of instructions that require him to analyze the situation, adopt a plan of current action, and react to what he observes.
He does not respond to a comment as a chat character, he follows an instruction that asks him - ### Instruction: How would you argue|protest|support|demand for {Char3} the need for {Action} to achieve the goal {Goal2}, taking into account the characters' personalities and their current relationship.
Or even keep a log of your internal monologue (thoughts).

This implies multiple requests to form meaningful interactions with the environment. In addition, this requires a complex mechanism for creating a primary context collected from different sources and possibly reformulating it and extracting theses from it, which will either be used to plan the agent’s behavior or to separate them into different histories for storage.

I described all this to make it clear that all this has little to do with chat.
But all this requires only a few functions. Inference, for executing instructions with the collected context. A vector database of texts or a function for calculating a semantic vector and a function for assessing the proximity of vectors. You will either use the base or write a context assembly algorithm yourself, based on functions.

All! The rest is convenience or the use of LoRA, which will improve the execution of individual instructions (a special decision system). It is advisable to switch LoRA without rebooting the model. Because it is often easier to get a good result for solving a specific problem by training a special set of instructions than a general one.

The priority is to minimize resources and conveniently manage them, requiring a minimum of reboots. It is necessary that saving states and changing modes does not involve the extremely costly reboot of the model.

I often mention that in addition to the vector base of the text, we need separate functions for calculating the semantic vector and calculating the proximity of vectors. This is due to the fact that the vector database usually cuts the text into overlapping slices. Moreover, depending on the nature of the text, both the size of the slices and the overlap greatly influence the efficiency. This is a good compromise for very large associative memories.

With a complex system for creating a limited context from a large number of short sources, it is more convenient to build repositories (often temporary lists) with full short texts, not limited to specific slice sizes, but allowing associative search.
It is clear that you can put not only a slice, but also the full text into a vector database, but often such work will be less efficient and transparent than directly using functions. Direct functions will simplify algorithms for using and modifying such temporary lists. Temporary list mode for the database will cause an increasing need to clean and condense the database to maintain its performance.

That's why I keep mentioning this possibility. This is in addition to other applications. For example, choosing one of several generated answers that is most relevant to a given goal.

These are the general considerations that would like to see implemented for Unity.

xbotter · 2024-02-18T04:11:15Z

The llama.cpp has been enhanced to support the BERT model, allowing the use of BERT embedding models.

ggerganov/llama.cpp#5423

martindevans · 2024-02-18T13:08:17Z

I also recently pushed a PR which properly L2 normalises the embeddings, which should vastly improve quality when long sentences are being embedded!

AsakusaRinne added the enhancement New feature or request label Nov 3, 2023

AsakusaRinne self-assigned this Nov 3, 2023

AsakusaRinne added this to LLamaSharp Dev Nov 9, 2023

AsakusaRinne moved this to 📋 TODO in LLamaSharp Dev Nov 9, 2023

This was referenced Nov 9, 2023

Create HTTP API server and provide API like OAI #269

Open

Roadmap to v1.0.0 #287

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve support for text-completion and text-embedding APIs #239

Improve support for text-completion and text-embedding APIs #239

AsakusaRinne commented Nov 3, 2023

Xsanf commented Nov 3, 2023 •

edited

Loading

xbotter commented Feb 18, 2024

martindevans commented Feb 18, 2024

Improve support for text-completion and text-embedding APIs #239

Improve support for text-completion and text-embedding APIs #239

Comments

AsakusaRinne commented Nov 3, 2023

Xsanf commented Nov 3, 2023 • edited Loading

xbotter commented Feb 18, 2024

martindevans commented Feb 18, 2024

Xsanf commented Nov 3, 2023 •

edited

Loading