[WIP] refactor: init some experimental refactoring. #362

AsakusaRinne · 2023-12-13T20:50:21Z

This PR refactors the implementations of LLamaSharp, which is targeted for v1.0.0.

I pushed some changes but has not completed it yet. This PR now is more like a proposal, no promising for compiling and test.

Currently I've made the following changes

Add IGenerationControl to allow customized control for the stop of generation, instead of just anti-prompts.
Add ITokenizer to allow customized tokenizer.
Allow call stateless executor with self-managed context.
Add TextCompletion, which is corresponding to ChatSession, to provide some further wrapping for text completion tasks. Related issue: Create HTTP API server and provide API like OAI #269 Garbled output from model in Unity #178

Some other proposals

Make the InstructExecutor obsolete because it's only one of the mode of text completion.
Allow self-managed kv cache, providing APIs to allocate, update and release the cache. (maybe do it on master?)
Apply batch decoding, which will refactor the StatelessExecutor again

For discussion

I'm wondering if we should introduce Microsoft.SemanticKernel.Abstractions to LLamaSharp, instead of just using it in SK integration. The reasons are listed as following:

Some features such as function call are complex and have been supported or partially supported in SK. We'll have a hard time to support them from scratch because of short of hands and the quick speed of LLM iteration.
Some useful abstractions are defined in SK. We could just take use these abstractions even though we have different implementations of the usages. For example, prompt template. It will provide convenience for both users and SK integration developers.
SK has been on the way to the standard of LLM in dotnet.

However, though, I also have some worrying about it:

We'll have to follow the changes of SK. If SK makes some break changes, we may have to rush to follow it.
Our directions of development will be restricted. I'm not sure how will SK be in the future. Though SK is opensource, it has many contents for Azure service because it's developed by MicroSoft. On the contrary, LLamaSharp absolutely does not has this preference.

@martindevans @SignalRT @saddam213 May I ask for some ideas from you? This is not a formal PR so please feel free to talk about anything.

martindevans · 2023-12-13T21:16:03Z

Allow self-managed kv cache, providing APIs to allocate, update and release the cache. (maybe do it on master?)
Apply batch decoding, which will refactor the StatelessExecutor again

These two will probably be handled by my ongoing work to redesign the executors. My plan is a new executor, built on a single batch, which can have multiple ongoing "conversations" that can all be updated in one single batch.

I'm wondering if we should introduce Microsoft.SemanticKernel.Abstractions to LLamaSharp, instead of just using it in SK integration

Personally I would be against this, I think we should keep LLamaSharp as close to just being a wrapper around llama.cpp as possible. However, we should definitely make sure the library is flexible enough that something like SK can be integrated with it!

martindevans · 2023-12-13T21:16:42Z

LLama/Abstractions/IInferenceParams.cs

+		/// <summary>
+		/// Set a custom generation control to use. <b>If this is set antiprompt will be ignored!</b>
+		/// </summary>
+		IGenerationControl GenerationControl { get; set; }


Suggested change

IGenerationControl GenerationControl { get; set; }

IGenerationControl? GenerationControl { get; set; }

martindevans · 2023-12-13T21:16:51Z

LLama/Abstractions/IInferenceParams.cs

+		/// <summary>
+		/// Set a custom tokenizer to use.
+		/// </summary>
+		ITokenizer Tokenizer { get; set; }


Suggested change

ITokenizer Tokenizer { get; set; }

ITokenizer? Tokenizer { get; set; }

martindevans · 2023-12-13T21:17:46Z

LLama/Control/DefaultGenerationControl.cs

+namespace LLama.Control
+{
+    /// <summary>
+    /// The default generation control in LLamaSharp, using antiprompts. This class should not be inherited. 


Suggested change

/// The default generation control in LLamaSharp, using antiprompts. This class should not be inherited.

/// The default generation control in LLamaSharp, using antiprompts.

It's sealed, so it's not possible to extend this class.

martindevans · 2023-12-13T21:18:19Z

LLama/Control/DefaultGenerationControl.cs

+{
+    /// <summary>
+    /// The default generation control in LLamaSharp, using antiprompts. This class should not be inherited. 
+    /// <b>Note that this class has state. The previous outputs feeded to it will affect its control.</b>


Suggested change

/// Note that this class has state. The previous outputs feeded to it will affect its control.

/// Note that this class has state. The previous outputs fed to it will affect its output.

martindevans · 2023-12-13T21:19:14Z

LLama/Control/DefaultGenerationControl.cs

+        /// </summary>
+        public bool ShouldStopGeneration(LLamaContext context, IInferenceParams inferenceParams, IEnumerable<int> lastOutputIds)
+        {
+            return false;


should this be returning false?

LLama/Control/IGenerationControl.cs

martindevans · 2023-12-13T21:22:15Z

LLama/Transform/DefaultTokenizer.cs

+namespace LLama.Transform
+{
+    /// <summary>
+    /// The default tokenizer of LLamaSharp. This class should not be inherited.


Suggested change

/// The default tokenizer of LLamaSharp. This class should not be inherited.

/// The default tokenizer of LLamaSharp.

martindevans · 2023-12-13T21:22:31Z

LLama/Transform/DefaultTokenizer.cs

+{
+    /// <summary>
+    /// The default tokenizer of LLamaSharp. This class should not be inherited.
+    /// <b>Note that this class has state. The previous outputs feeded to it will affect its control.</b>


Suggested change

/// Note that this class has state. The previous outputs feeded to it will affect its control.

/// Note that this class has state. The previous outputs fed to it will affect its output.

martindevans · 2023-12-13T21:25:34Z

LLama/Transform/DefaultTokenizer.cs

+        /// <summary>
+        /// <inheritdoc/>
+        /// </summary>
+        public IEnumerable<int> Tokenize(LLamaContext context, string text, bool addBos = true, bool special = false)


I think it would be better to accept a LLamaWeights in the constructor, instead of a LLamaContext in the Tokenize/Detokenize methids. That simplifies usage and allows you to use the tokenizer without creating an entire context (which is quite memory hungry).

I'll change it, thank you!

Is there already a way to tokenize without a context? I didn't find a such method

LLamaWeights w = your_weights; w.NativeHandle.Tokenize("a string");

Should do it. There should really be higher level wrappers for tokenization in LLamaWeights, so that you don't have to access the NativeHandle, but I haven't built them yet.

I see, I mistook the one in NativeApi with context as parameter as the lowest level api. I'll add a wrapper for it with LLamaWeights. :)

martindevans · 2023-12-13T21:26:18Z

LLama/Transform/StreamingTokenDecoder.cs

 {
    /// <summary>
    /// Decodes a stream of tokens into a stream of characters
    /// </summary>
    public sealed class StreamingTokenDecoder
    {
-        private readonly SafeLlamaModelHandle _weights;
+        private readonly SafeLlamaModelHandle? _weights;


I don't think any of the changes in this class are necessary if we go with the suggested change to the DefaultTokenizer

martindevans · 2023-12-13T21:27:19Z

Over all I really like the changes 1, 2 and 3. Can we split them into 3 separate PRs so they can proceed separately?

AsakusaRinne · 2023-12-14T02:58:43Z

Over all I really like the changes 1, 2 and 3. Can we split them into 3 separate PRs so they can proceed separately?

Sure, I'll separate them into 3 PRs.

Besides there're some proposals I forgot to mention above:

Support cuda backends with different avx levels.
Mix the cuda11 backend and cuda12 backend to one because of the feature detection. Besides, allow users to specify the cuda version in NativeLibraryConfig to deal with some complex conditions.
Make the withLog in NativeLibraryConfig be true by defualt because according to the issues, the information of it is very important for users to debug.
Refactor the web service implementation after the support for batch decoding.

Also there're some features that makes me headache. I don't mean to complete them in v1.0.0 but just include them in the discussion.

Support fine-tune.
LLaVA support, or more generally, image2text support.
Catch the error from llama.cpp. It's useful for automatic configuration of layers offloaded to VRAM.

refactor: init some refactorings for experiment.

4f44e3b

AsakusaRinne added help wanted Extra attention is needed break change labels Dec 13, 2023

AsakusaRinne requested a review from martindevans December 13, 2023 20:50

martindevans reviewed Dec 13, 2023

View reviewed changes

LLama/Control/IGenerationControl.cs Show resolved Hide resolved

martindevans reviewed Dec 13, 2023

View reviewed changes

AsakusaRinne mentioned this pull request Dec 14, 2023

feat: support custom generation control of executors. #364

Closed

AsakusaRinne closed this Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] refactor: init some experimental refactoring. #362

[WIP] refactor: init some experimental refactoring. #362

AsakusaRinne commented Dec 13, 2023

martindevans commented Dec 13, 2023 •

edited

Loading

martindevans Dec 13, 2023

martindevans Dec 13, 2023

martindevans Dec 13, 2023

martindevans Dec 13, 2023

martindevans Dec 13, 2023

martindevans Dec 13, 2023

martindevans Dec 13, 2023

martindevans Dec 13, 2023

martindevans Dec 13, 2023

AsakusaRinne Dec 14, 2023

AsakusaRinne Dec 15, 2023

martindevans Dec 15, 2023 •

edited

Loading

AsakusaRinne Dec 15, 2023

martindevans Dec 13, 2023

martindevans commented Dec 13, 2023

AsakusaRinne commented Dec 14, 2023

	IGenerationControl GenerationControl { get; set; }
	IGenerationControl? GenerationControl { get; set; }

	ITokenizer Tokenizer { get; set; }
	ITokenizer? Tokenizer { get; set; }

	/// The default generation control in LLamaSharp, using antiprompts. This class should not be inherited.
	/// The default generation control in LLamaSharp, using antiprompts.

	/// <b>Note that this class has state. The previous outputs feeded to it will affect its control.</b>
	/// <b>Note that this class has state. The previous outputs fed to it will affect its output.</b>

	/// The default tokenizer of LLamaSharp. This class should not be inherited.
	/// The default tokenizer of LLamaSharp.

[WIP] refactor: init some experimental refactoring. #362

[WIP] refactor: init some experimental refactoring. #362

Conversation

AsakusaRinne commented Dec 13, 2023

Currently I've made the following changes

Some other proposals

For discussion

martindevans commented Dec 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martindevans Dec 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martindevans commented Dec 13, 2023

AsakusaRinne commented Dec 14, 2023

martindevans commented Dec 13, 2023 •

edited

Loading

martindevans Dec 15, 2023 •

edited

Loading