Added iOS refactored LlmInference, LlmTaskRunner and LlmSessionRunner #5566

priankakariatyml · 2024-08-12T15:45:58Z

Added LlmSessionRunner that manages a C session created by a C engine. Each swift Llm session will be managed by its own LlmSessionRunner.
Refactored LlmTaskRunner to manage the C Llm Engine. All C methods that require the engine instance are invoked via the LlmTaskRunner
Added refactored interface for LlmInference.

mediapipe/tasks/ios/genai/core/sources/LlmTaskRunnerRefactored.swift

yishuangP · 2024-08-12T18:50:20Z

mediapipe/tasks/ios/genai/inference/sources/LlmInferenceRefactored.swift

Is it okay to drop the Refactored suffix so we can see the diffs? Thanks!

Done. FYI, until we merge the all the code master will have an incomplete implementation.

mediapipe/tasks/ios/genai/core/sources/LlmTaskRunnerRefactored.swift

yishuangP · 2024-08-13T22:42:28Z

mediapipe/tasks/ios/genai/core/sources/LlmTaskRunnerRefactored.swift

+
+  /// Creates a new C LLM session from the current C engine and returns an `LlmSessionRunner` 
+  /// that wraps around the newly created C session. The session runner is responsible for managing 
+  /// its underlying C session.


Shall we mention that this method will always return a new instance even with the same LlmSessionConfig?

We should. I have added a note in the comments.

yishuangP · 2024-08-13T22:52:06Z

mediapipe/tasks/ios/genai/core/sources/LlmSessionRunner.swift

+      }
+      guard let cResponse = responseContext?.pointee else {
+        return
+      }


Is this expected? Shall we throw an error here?

Tbh, not expected. I have modified the code to throw an error if the responseContext guard fails. Also added some extra explanation to document the behaviour properly. The behaviour is similar to the previous version but just documenting it for clarity.

Can't throw an error if context is nil ( First guard):
This check is in the body of the C callback. The context has to be non-nil for us to be able to invoke any swift callback which is where we can relay any errors at this point. So if its nil, can't do anything.

yishuangP

Thanks Prianka, left some comments

yishuangP · 2024-08-16T21:48:12Z

mediapipe/tasks/ios/genai/inference/sources/LlmInference.swift

+    /// array being in memory as long as the engine isn't deallocated since
+    /// `options.supportedLoraRanks` only has function scope.
+    ///
+    /// TODO: If C++ API mem copies the array the following code can be updated to


The C++ layer copies the supported_lora_ranks array elements and I think we probably should update the C API with cost size_t* supported_lora_ranks.

I have updated to withUnsfeMutable. Also removed allocation of modelPath and cacheDir. They can also be passed using withCString.

Sorry it seems that you haven't committed the changes?

yishuangP · 2024-08-16T21:59:41Z

mediapipe/tasks/ios/genai/inference/sources/LlmInference.swift

+  /// - Returns:
+  ///   - An `LlmSessionRunner` that wraps around a new session.
+  /// - Throws: An error if the underlying engine could not create a session.
+  func createSessionRunner(sessionConfig: LlmSessionConfig) throws -> LlmSessionRunner {


Just to confirm, LlmSessionConfig is the C struct right? Can we also create a swift class for this? Similar to LlmInference option.

We do have Session Options in swift. I have not committed the file for the Session to keep the current PR size manageable. Swift Options are also declared in that file.
This method is not public. It has internal visibility and is called by the init of the swift Session. Session.init(llmInference: options:) creates the C struct LlmSessionConfig from the swift options provided by the user and then calls llmInference.createSessionRunner(sessionConfig).

I'll commit the rest of the code once this PR is merged.

We can also change this method to createSessionRunner(options: Session.Options) and let this method handle creation of the C session config from the options passed in by the Session. I just thought it would be better for the Session to convert its options to the config however it sees fit and not let the LlmInference worry about it.
LMK what you think.

This method is not public. It has internal visibility and is called by the init of the swift Session. Session.init(llmInference: options:) creates the C struct LlmSessionConfig from the swift options provided by the user and then calls llmInference.createSessionRunner(sessionConfig).

Thanks for the explanation! If you don't mind, can we make all changes in one PR? We have some internal dependency on this swift API, I hope we can make all necessary changes in one PR. Let me know if you have other opinions.

I have pushed all the code to this PR. Do have a look

yishuangP · 2024-08-20T18:24:38Z

mediapipe/tasks/ios/genai/inference/sources/LlmInference.swift

+  /// - Returns:
+  ///   - An `LlmSessionRunner` that wraps around a new session.
+  /// - Throws: An error if the underlying engine could not create a session.
+  func createSessionRunner(sessionConfig: LlmSessionConfig) throws -> LlmSessionRunner {


This method is not public. It has internal visibility and is called by the init of the swift Session. Session.init(llmInference: options:) creates the C struct LlmSessionConfig from the swift options provided by the user and then calls llmInference.createSessionRunner(sessionConfig).

Thanks for the explanation! If you don't mind, can we make all changes in one PR? We have some internal dependency on this swift API, I hope we can make all necessary changes in one PR. Let me know if you have other opinions.

yishuangP · 2024-08-20T18:29:17Z

mediapipe/tasks/ios/genai/inference/sources/LlmInference.swift

+    /// array being in memory as long as the engine isn't deallocated since
+    /// `options.supportedLoraRanks` only has function scope.
+    ///
+    /// TODO: If C++ API mem copies the array the following code can be updated to


Sorry it seems that you haven't committed the changes?

priankakariatyml · 2024-08-21T15:23:22Z

mediapipe/tasks/ios/genai/inference/sources/LlmInference.swift

+
+    llmTaskRunner = try options.modelPath.withCString { modelPath in
+      try cacheDirectory.withCString { cacheDirectory in
+        try options.supportedLoraRanks.withUnsafeMutableBufferPointer { supportedLoraRanks in


w.r.t #5566 (comment)
@yishuangP These changes were already pushed. I think there was some git refresh issue.

… LLM Inference

priankakariatyml · 2024-08-22T15:07:57Z

mediapipe/tasks/ios/genai/inference/sources/LlmInference+Session.swift

+    private let llmSessionRunner: LlmSessionRunner
+
+    // LLM Inference used to create this session.
+    private let llmInference: LlmInference


@yishuangP I am storing a strong reference to llmInference to ensure that any given session does not out live the llmInference and create all sorts of undefined behaviour. This won't cause a retain cycle since llmInference does not store a reference to any session. This means the ref would get deallocated only after all instances of session created from it are destroyed.

Also lets me synchronise the response generation calls across sessions to fix a crash because of simultaneous response generation calls.

Thanks! Yeah agree that we need a strong reference to llmInference here to manage life cycles.

Also lets me synchronise the response generation calls across sessions to fix a crash because of simultaneous response generation calls.

I believe only 1 response generation can happen at a time, is this what you refer to?

yes. across sessions.

priankakariatyml · 2024-08-22T15:11:41Z

mediapipe/tasks/ios/genai/inference/sources/LlmInference.swift

+  /// to execute response generation. If response generation is already in progress, throws an
+  /// error.
+  /// Any
+  func shouldContinueWithResponseGeneration() throws {


@yishuangP I am maintaining a response generation state in the LlmInference which the session can read and update through these methods to prevent simultaneous response generation calls from multiple sessions.

yishuangP

Thanks a lot! Can you help fix the sample app examples/llm_inference/ios/InferenceExample in a follow-up PR?

yishuangP · 2024-08-24T01:09:00Z

mediapipe/tasks/ios/genai/inference/sources/LlmInference+Session.swift

+    private let llmSessionRunner: LlmSessionRunner
+
+    // LLM Inference used to create this session.
+    private let llmInference: LlmInference


Thanks! Yeah agree that we need a strong reference to llmInference here to manage life cycles.

Also lets me synchronise the response generation calls across sessions to fix a crash because of simultaneous response generation calls.

I believe only 1 response generation can happen at a time, is this what you refer to?

yishuangP · 2024-08-24T01:22:57Z

mediapipe/tasks/ios/genai/inference/sources/LlmInference+Session.swift

+            let humanReadableLlmResponse = Session.humanReadableString(
+              llmResponses: responseStrings, stripLeadingWhitespaces: receivedFirstToken)
+          else {
+            progress(nil, GenAiInferenceError.invalidResponse)


We should also call self?.llmInference.markResponseGenerationCompleted() if it errors out?

You mean if it errors out in the progress callback? I am assuming, even if one of the partial responses is invalid, the C++ session would still continue giving callbacks for the remaining responses?
In that case wouldn't marking it completed, still cause the same issue? By design, irrespective of whether the response returned by C++ is invalid, the session runner will also call completion() after progress() if the response context marks the bool done as true.

LMK if this is not how C++ behaves.

But thanks to your query, I spotted an issue in the control flow of sync predict() when an error happens and have fixed the same.

Thanks for the explanation. I took a look at llmSessionRunner.predictAsync implementation, we don't need to call self?.llmInference.markResponseGenerationCompleted() here.

mediapipe/tasks/ios/genai/inference/sources/LlmInference.swift

…erence+Session.

priankakariatyml · 2024-08-26T12:35:05Z

Thanks a lot! Can you help fix the sample app examples/llm_inference/ios/InferenceExample in a follow-up PR?

Sure. I will keep this ready.

yishuangP

Thanks!

yishuangP · 2024-09-04T23:49:53Z

mediapipe/tasks/ios/genai/inference/sources/LlmInference+Session.swift

+            let humanReadableLlmResponse = Session.humanReadableString(
+              llmResponses: responseStrings, stripLeadingWhitespaces: receivedFirstToken)
+          else {
+            progress(nil, GenAiInferenceError.invalidResponse)


Thanks for the explanation. I took a look at llmSessionRunner.predictAsync implementation, we don't need to call self?.llmInference.markResponseGenerationCompleted() here.

priankakariatyml · 2024-09-05T14:13:49Z

mediapipe/tasks/ios/genai/inference/sources/LlmInference.swift

+            sequence_batch_size: LlmInference.sequenceBatchSize,
+            number_of_supported_lora_ranks: options.supportedLoraRanks.count,
+            supported_lora_ranks: supportedLoraRanks.baseAddress,
+            max_top_k: options.maxTopk)


@yishuangP @schmidt-sebastian This PR is out of sync with the C llm_inference_engine on master. You won't be able to merge this one because of a change in the LlmModelSettings.
Can you pass
llm_activation_data_type: LlmActivationDataType(0), num_draft_tokens: 0 after max_top_k to this API while merging. That should fix it. I'll add these new options to the Swift APIs once this PR is merged.

priankakariatyml added 6 commits August 12, 2024 21:07

Added refactored iOS LlmTaskRunner

4045903

Added iOS LlmSessionRunner

35f6ac4

Added more errors to GenAiInferenceError

ac08de3

Added refactored LlmInference

43b100b

Merge branch 'master' into ios-llm-inference-refactor

0a5dba8

Fixed typo in GenAiInferenceError

a3c7963

priankakariatyml commented Aug 12, 2024

View reviewed changes

mediapipe/tasks/ios/genai/core/sources/LlmTaskRunnerRefactored.swift Outdated Show resolved Hide resolved

priankakariatyml requested a review from yishuangP August 12, 2024 16:07

Updated comments in iOS LlmInference

3c5c61a

yishuangP reviewed Aug 13, 2024

View reviewed changes

priankakariatyml added 4 commits August 14, 2024 20:31

Dropped Refactored suffix for modified files in iOS genai

ab0b45a

Aded iOS session runner to build files

2a2fded

Updated documentation of LlmTaskRunner

29e3a5f

Added extra safeguards for response context in LlmSessionRunner

f721018

yishuangP requested changes Aug 16, 2024

View reviewed changes

Removed allocation of LlmInference Options

88e1abc

yishuangP requested changes Aug 20, 2024

View reviewed changes

Added LlmInference+Session.swift

52fa204

priankakariatyml commented Aug 21, 2024

View reviewed changes

priankakariatyml added 2 commits August 21, 2024 22:33

Added documentation regarding session and inference life times to iOS…

d252b76

… LLM Inference

Fixed bug due to simultaneous response generation calls across sessions

3e3e8d1

priankakariatyml commented Aug 22, 2024

View reviewed changes

priankakariatyml added 3 commits August 22, 2024 20:55

Updated the response generation queue to be serial in iOS LlmInference

c5bae45

Updated documentation of iOS LlmInference

c27116b

Updated documentation of LlmInference+Session

6811dce

yishuangP requested changes Aug 24, 2024

View reviewed changes

priankakariatyml added 2 commits August 26, 2024 15:00

Fixed typos

3d650fd

Fixed marking of response generation completed control flow in LlmInf…

a4c6a6a

…erence+Session.

yishuangP approved these changes Sep 4, 2024

View reviewed changes

priankakariatyml commented Sep 5, 2024

View reviewed changes

Merge branch 'master' into ios-llm-inference-refactor

e1a2141

yishuangP approved these changes Sep 6, 2024

View reviewed changes

copybara-service bot merged commit 7946760 into google-ai-edge:master Sep 18, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added iOS refactored LlmInference, LlmTaskRunner and LlmSessionRunner #5566

Added iOS refactored LlmInference, LlmTaskRunner and LlmSessionRunner #5566

priankakariatyml commented Aug 12, 2024

yishuangP Aug 12, 2024

priankakariatyml Aug 14, 2024

yishuangP Aug 13, 2024

priankakariatyml Aug 14, 2024

yishuangP Aug 13, 2024

priankakariatyml Aug 14, 2024 •

edited

Loading

yishuangP left a comment

yishuangP Aug 16, 2024

priankakariatyml Aug 19, 2024

yishuangP Aug 20, 2024

yishuangP Aug 16, 2024

priankakariatyml Aug 19, 2024 •

edited

Loading

priankakariatyml Aug 19, 2024

yishuangP Aug 20, 2024

priankakariatyml Aug 21, 2024

yishuangP Aug 20, 2024

yishuangP Aug 20, 2024

priankakariatyml Aug 21, 2024

priankakariatyml Aug 22, 2024 •

edited

Loading

yishuangP Aug 24, 2024

priankakariatyml Aug 26, 2024

priankakariatyml Aug 22, 2024 •

edited

Loading

yishuangP left a comment

yishuangP Aug 24, 2024

yishuangP Aug 24, 2024

priankakariatyml Aug 26, 2024

priankakariatyml Aug 26, 2024

yishuangP Sep 4, 2024

priankakariatyml commented Aug 26, 2024

yishuangP left a comment

yishuangP Sep 4, 2024

priankakariatyml Sep 5, 2024 •

edited

Loading

Added iOS refactored LlmInference, LlmTaskRunner and LlmSessionRunner #5566

Added iOS refactored LlmInference, LlmTaskRunner and LlmSessionRunner #5566

Conversation

priankakariatyml commented Aug 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

priankakariatyml Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

yishuangP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

priankakariatyml Aug 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

priankakariatyml Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

priankakariatyml Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

yishuangP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

priankakariatyml commented Aug 26, 2024

yishuangP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

priankakariatyml Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

priankakariatyml Aug 14, 2024 •

edited

Loading

priankakariatyml Aug 19, 2024 •

edited

Loading

priankakariatyml Aug 22, 2024 •

edited

Loading

priankakariatyml Aug 22, 2024 •

edited

Loading

priankakariatyml Sep 5, 2024 •

edited

Loading