Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops][Actions] Allow streaming responses from Generative AI connector #161676

Merged
merged 12 commits into from
Jul 14, 2023

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Jul 11, 2023

Resolves #159598

Summary

This PR modifies the test subaction of the Generative AI connector to accept a stream parameter (default: false) that allows for a streaming response.

The Generative AI connector is basically a pass-through to the Open AI/Azure OpenAI APIs, where the stream parameter is passed in via the body of the request. This means that with the existing connector, users could specify stream: true in the body which would lead to unexpected results when the action is unprepared to return streaming results. This PR sanitizes the body that is passed in the run subaction to prevent the stream parameter from being set to true and explicitly sets the stream parameter for the test subaction.

In order to test the streaming response, I created an example plugin that prompts users to create a Generative AI connector if one does not exist and then executes actions using the connector with stream set to true. This borrows liberally from @dgieselaar's existing work from #158678

Jul-12-2023.19-16-43.mp4

To Verify

  1. Navigate to https://localhost:5601/app/GenAiStreamingResponseExample
  2. Set up a Generative AI connector
  3. Open the network console. Enter a prompt and click Stream Response
  4. You should see the chat response return streaming results.

@ymao1 ymao1 force-pushed the gen-ai-streaming-response branch 3 times, most recently from 978a274 to c258e00 Compare July 11, 2023 20:26
@ymao1 ymao1 force-pushed the gen-ai-streaming-response branch from c258e00 to fa085b4 Compare July 12, 2023 19:08
@ymao1 ymao1 changed the title Gen ai streaming response [Response Ops][Actions] Allow streaming responses from Generative AI connector Jul 12, 2023
@ymao1 ymao1 self-assigned this Jul 12, 2023
@ymao1 ymao1 added Feature:Actions Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.10.0 release_note:skip Skip the PR/issue when compiling release notes labels Jul 12, 2023
@ymao1 ymao1 marked this pull request as ready for review July 12, 2023 23:19
@ymao1 ymao1 requested review from a team as code owners July 12, 2023 23:19
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@ymao1 ymao1 requested a review from dgieselaar July 12, 2023 23:20
@lcawl
Copy link
Contributor

lcawl commented Jul 13, 2023

If I'm understanding this PR correctly, it seems like there's a test subaction that we'll need to add to the open API specification for the _execute path (e.g. in https://github.com/elastic/kibana/blob/main/x-pack/plugins/actions/docs/openapi/paths/s%40%7Bspaceid%7D%40api%40actions%40connector%40%7Bconnectorid%7D%40_execute.yaml) with details about the new stream property

await actionsClient.execute({
actionId: request.body.connector_id,
params: {
subAction: 'test',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I follow why we need to call test here? What does test do vs run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically we expect the run API to be used for scheduled actions (actions run via task manager) while the test API should be used for ad hoc action executions that are run immediately and not via a scheduled task.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! just to be sure: if I want a streaming response, I should use test, and there are no other implications other than that it looks weird? I've also got a followup question: what happens when we start calling more endpoints, or add more LLMs? Do we all handle that in this connector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I want a streaming response, I should use test, and there are no other implications other than that it looks weird?

Yes, that's correct! You should use the test subaction

what happens when we start calling more endpoints, or add more LLMs

The generative AI connector is right now basically a pass through to the 3rd party APIs, so the body that you pass to the connector executor is basically the body that's passed to the API, so it's up to the calling function to format the body correctly for the connector provider. Currently, if we want to add a new endpoint for an existing provider, the connector allows you to specify the api URL in the connector config. If we want to add more LLMs, we would need to update the connector to allow for a new provider.

Copy link
Member

@dgieselaar dgieselaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @ymao1, thank you very much :) just have a couple of questions

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little concerned about us returning the response from the service directly, so posting a review early - just made it a bit of the way through before stumbling on the pipeStreamingResponse thread ...

@@ -49,8 +58,8 @@ export class GenAiConnector extends SubActionConnector<GenAiConfig, GenAiSecrets

this.registerSubAction({
name: SUB_ACTION.TEST,
method: 'runApi',
schema: GenAiRunActionParamsSchema,
method: 'executeApi',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a migration issue if these connectors were used in queued actions. But they aren't, so this seems fine. Though would be a problem for ZDT changes. Which we don't have to worry about yet.

@ymao1 ymao1 requested a review from pmuellr July 14, 2023 12:18
});
return response.data;
}

public async executeApi({
Copy link
Contributor

@stephmilovic stephmilovic Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about naming this method more descriptively? streamApi? execute seems to be a synonym for run, and it is not immediately apparent what the difference is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just testApi since the subaction is test? streamApi seems deceptive since you can request a streaming response or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... should the subaction be test? The test refers to this screen:
Screenshot 2023-07-14 at 11 55 39 AM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it would be less confusing, I can add a new subaction for this stream functionality and leave the test subaction the same as before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 09b0a32

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appreciate that, thank you!

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a question about the dont-compress-this header


export const pipeStreamingResponse = (response: AxiosResponse<IncomingMessage>) => {
response.data.headers = {
['Content-Type']: 'dont-compress-this',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we leaving this in? It can't be copied to the eventual outgoing request, I assume, not sure if it's a special directive to the core http framework, or what. Also, not sure that these sorts of things can't be compressed, but there do tend to be issues w/compression and streaming APIs generally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iirc, it will be compressed by hapi if it recognizes the content type. A bit of a hack but it works. Maybe there's a less quirky string that we can use though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, cool. Let's add a comment in the code, as this will confuse someone in the future!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in ecec58c

Copy link
Member

@dgieselaar dgieselaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
stackConnectors 35.2KB 35.2KB +18.0B
Unknown metric groups

ESLint disabled line counts

id before after diff
enterpriseSearch 14 16 +2
securitySolution 411 415 +4
total +6

Total ESLint disabled count

id before after diff
enterpriseSearch 15 17 +2
securitySolution 490 494 +4
total +6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@ymao1 ymao1 requested a review from stephmilovic July 14, 2023 21:20
@stephmilovic
Copy link
Contributor

hey im having a hard time running the example app. i couldnt get the application to load at all without adding the no base path arg so this is how i am running it: yarn start --run-examples --no-base-path

this is what i see on your app/GenAiStreamingResponseExample route:
Screenshot 2023-07-14 at 4 46 40 PM

i can see the guide onboarding example app running as expected:
Screenshot 2023-07-14 at 4 46 25 PM

Am I missing something?

Copy link
Contributor

@stephmilovic stephmilovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the enhancement. It may a good idea to also update this documentation with the new method: docs/management/connectors/action-types/gen-ai.asciidoc. Nice work!

@ymao1
Copy link
Contributor Author

ymao1 commented Jul 14, 2023

It may a good idea to also update this documentation with the new method: docs/management/connectors/action-types/gen-ai.asciidoc.

I'll discuss this with @lcawl and create a separate PR for the docs if necessary. Thanks!

@ymao1 ymao1 merged commit 8a56a2b into elastic:main Jul 14, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jul 14, 2023
@ymao1 ymao1 deleted the gen-ai-streaming-response branch July 14, 2023 23:17
walterra added a commit that referenced this pull request Jul 28, 2023
…sponse-stream`. (#162609)

## Summary

Follow up to #161676.

Since we added support for handling SSE-streams returned by OpenAI APIs
to `@kbn/ml-response-stream` in #162335, this updates the "Gen AI
Streaming Response" developer example to make use of the custom hook
`useFetchStream()` from `@kbn/ml-response-stream` instead of its inline
code to process the stream on the client.

It also improves the refresh behaviour: Previously, once a prompt was
entered and the accordion opened, every keystroke in the prompt textarea
would cause a new request to the AI endpoint. This update adds a
`Refresh prompt` to only do a new request on this user action. Support
for cancelling requests when unmounting the `StreamingResponse`
component was also added.


![gen-ai-streaming-0001](https://github.com/elastic/kibana/assets/230104/af81d3ac-cc03-4550-9aa5-8685c15304cd)

### Checklist

- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
ThomThomson pushed a commit to ThomThomson/kibana that referenced this pull request Aug 1, 2023
…sponse-stream`. (elastic#162609)

## Summary

Follow up to elastic#161676.

Since we added support for handling SSE-streams returned by OpenAI APIs
to `@kbn/ml-response-stream` in elastic#162335, this updates the "Gen AI
Streaming Response" developer example to make use of the custom hook
`useFetchStream()` from `@kbn/ml-response-stream` instead of its inline
code to process the stream on the client.

It also improves the refresh behaviour: Previously, once a prompt was
entered and the accordion opened, every keystroke in the prompt textarea
would cause a new request to the AI endpoint. This update adds a
`Refresh prompt` to only do a new request on this user action. Support
for cancelling requests when unmounting the `StreamingResponse`
component was also added.


![gen-ai-streaming-0001](https://github.com/elastic/kibana/assets/230104/af81d3ac-cc03-4550-9aa5-8685c15304cd)

### Checklist

- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Actions release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to stream Generative AI responses instead of waiting for the prompt to complete
8 participants