Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added iOS refactored LlmInference, LlmTaskRunner and LlmSessionRunner #5566

Conversation

priankakariatyml
Copy link
Collaborator

  1. Added LlmSessionRunner that manages a C session created by a C engine. Each swift Llm session will be managed by its own LlmSessionRunner.
  2. Refactored LlmTaskRunner to manage the C Llm Engine. All C methods that require the engine instance are invoked via the LlmTaskRunner
  3. Added refactored interface for LlmInference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to drop the Refactored suffix so we can see the diffs? Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. FYI, until we merge the all the code master will have an incomplete implementation.


/// Creates a new C LLM session from the current C engine and returns an `LlmSessionRunner`
/// that wraps around the newly created C session. The session runner is responsible for managing
/// its underlying C session.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we mention that this method will always return a new instance even with the same LlmSessionConfig?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should. I have added a note in the comments.

}
guard let cResponse = responseContext?.pointee else {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected? Shall we throw an error here?

Copy link
Collaborator Author

@priankakariatyml priankakariatyml Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, not expected. I have modified the code to throw an error if the responseContext guard fails. Also added some extra explanation to document the behaviour properly. The behaviour is similar to the previous version but just documenting it for clarity.

Can't throw an error if context is nil ( First guard):
This check is in the body of the C callback. The context has to be non-nil for us to be able to invoke any swift callback which is where we can relay any errors at this point. So if its nil, can't do anything.

Copy link
Contributor

@yishuangP yishuangP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Prianka, left some comments

/// array being in memory as long as the engine isn't deallocated since
/// `options.supportedLoraRanks` only has function scope.
///
/// TODO: If C++ API mem copies the array the following code can be updated to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ layer copies the supported_lora_ranks array elements and I think we probably should update the C API with cost size_t* supported_lora_ranks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated to withUnsfeMutable. Also removed allocation of modelPath and cacheDir. They can also be passed using withCString.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it seems that you haven't committed the changes?

/// - Returns:
/// - An `LlmSessionRunner` that wraps around a new session.
/// - Throws: An error if the underlying engine could not create a session.
func createSessionRunner(sessionConfig: LlmSessionConfig) throws -> LlmSessionRunner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, LlmSessionConfig is the C struct right? Can we also create a swift class for this? Similar to LlmInference option.

Copy link
Collaborator Author

@priankakariatyml priankakariatyml Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have Session Options in swift. I have not committed the file for the Session to keep the current PR size manageable. Swift Options are also declared in that file.
This method is not public. It has internal visibility and is called by the init of the swift Session. Session.init(llmInference: options:) creates the C struct LlmSessionConfig from the swift options provided by the user and then calls llmInference.createSessionRunner(sessionConfig).

I'll commit the rest of the code once this PR is merged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also change this method to createSessionRunner(options: Session.Options) and let this method handle creation of the C session config from the options passed in by the Session. I just thought it would be better for the Session to convert its options to the config however it sees fit and not let the LlmInference worry about it.
LMK what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not public. It has internal visibility and is called by the init of the swift Session. Session.init(llmInference: options:) creates the C struct LlmSessionConfig from the swift options provided by the user and then calls llmInference.createSessionRunner(sessionConfig).

Thanks for the explanation! If you don't mind, can we make all changes in one PR? We have some internal dependency on this swift API, I hope we can make all necessary changes in one PR. Let me know if you have other opinions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed all the code to this PR. Do have a look

/// - Returns:
/// - An `LlmSessionRunner` that wraps around a new session.
/// - Throws: An error if the underlying engine could not create a session.
func createSessionRunner(sessionConfig: LlmSessionConfig) throws -> LlmSessionRunner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not public. It has internal visibility and is called by the init of the swift Session. Session.init(llmInference: options:) creates the C struct LlmSessionConfig from the swift options provided by the user and then calls llmInference.createSessionRunner(sessionConfig).

Thanks for the explanation! If you don't mind, can we make all changes in one PR? We have some internal dependency on this swift API, I hope we can make all necessary changes in one PR. Let me know if you have other opinions.

/// array being in memory as long as the engine isn't deallocated since
/// `options.supportedLoraRanks` only has function scope.
///
/// TODO: If C++ API mem copies the array the following code can be updated to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it seems that you haven't committed the changes?


llmTaskRunner = try options.modelPath.withCString { modelPath in
try cacheDirectory.withCString { cacheDirectory in
try options.supportedLoraRanks.withUnsafeMutableBufferPointer { supportedLoraRanks in
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w.r.t #5566 (comment)
@yishuangP These changes were already pushed. I think there was some git refresh issue.

private let llmSessionRunner: LlmSessionRunner

// LLM Inference used to create this session.
private let llmInference: LlmInference
Copy link
Collaborator Author

@priankakariatyml priankakariatyml Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yishuangP I am storing a strong reference to llmInference to ensure that any given session does not out live the llmInference and create all sorts of undefined behaviour. This won't cause a retain cycle since llmInference does not store a reference to any session. This means the ref would get deallocated only after all instances of session created from it are destroyed.

Also lets me synchronise the response generation calls across sessions to fix a crash because of simultaneous response generation calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yeah agree that we need a strong reference to llmInference here to manage life cycles.

Also lets me synchronise the response generation calls across sessions to fix a crash because of simultaneous response generation calls.

I believe only 1 response generation can happen at a time, is this what you refer to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. across sessions.

/// to execute response generation. If response generation is already in progress, throws an
/// error.
/// Any
func shouldContinueWithResponseGeneration() throws {
Copy link
Collaborator Author

@priankakariatyml priankakariatyml Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yishuangP I am maintaining a response generation state in the LlmInference which the session can read and update through these methods to prevent simultaneous response generation calls from multiple sessions.

Copy link
Contributor

@yishuangP yishuangP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Can you help fix the sample app examples/llm_inference/ios/InferenceExample in a follow-up PR?

private let llmSessionRunner: LlmSessionRunner

// LLM Inference used to create this session.
private let llmInference: LlmInference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yeah agree that we need a strong reference to llmInference here to manage life cycles.

Also lets me synchronise the response generation calls across sessions to fix a crash because of simultaneous response generation calls.

I believe only 1 response generation can happen at a time, is this what you refer to?

let humanReadableLlmResponse = Session.humanReadableString(
llmResponses: responseStrings, stripLeadingWhitespaces: receivedFirstToken)
else {
progress(nil, GenAiInferenceError.invalidResponse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also call self?.llmInference.markResponseGenerationCompleted() if it errors out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean if it errors out in the progress callback? I am assuming, even if one of the partial responses is invalid, the C++ session would still continue giving callbacks for the remaining responses?
In that case wouldn't marking it completed, still cause the same issue? By design, irrespective of whether the response returned by C++ is invalid, the session runner will also call completion() after progress() if the response context marks the bool done as true.

LMK if this is not how C++ behaves.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But thanks to your query, I spotted an issue in the control flow of sync predict() when an error happens and have fixed the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I took a look at llmSessionRunner.predictAsync implementation, we don't need to call self?.llmInference.markResponseGenerationCompleted() here.

@priankakariatyml
Copy link
Collaborator Author

Thanks a lot! Can you help fix the sample app examples/llm_inference/ios/InferenceExample in a follow-up PR?

Sure. I will keep this ready.

Copy link
Contributor

@yishuangP yishuangP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

let humanReadableLlmResponse = Session.humanReadableString(
llmResponses: responseStrings, stripLeadingWhitespaces: receivedFirstToken)
else {
progress(nil, GenAiInferenceError.invalidResponse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I took a look at llmSessionRunner.predictAsync implementation, we don't need to call self?.llmInference.markResponseGenerationCompleted() here.

sequence_batch_size: LlmInference.sequenceBatchSize,
number_of_supported_lora_ranks: options.supportedLoraRanks.count,
supported_lora_ranks: supportedLoraRanks.baseAddress,
max_top_k: options.maxTopk)
Copy link
Collaborator Author

@priankakariatyml priankakariatyml Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yishuangP @schmidt-sebastian This PR is out of sync with the C llm_inference_engine on master. You won't be able to merge this one because of a change in the LlmModelSettings.
Can you pass
llm_activation_data_type: LlmActivationDataType(0), num_draft_tokens: 0 after max_top_k to this API while merging. That should fix it. I'll add these new options to the Swift APIs once this PR is merged.

@copybara-service copybara-service bot merged commit 7946760 into google-ai-edge:master Sep 18, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants