-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Response Ops][Actions] Allow streaming responses from Generative AI connector #161676
Conversation
978a274
to
c258e00
Compare
c258e00
to
fa085b4
Compare
Pinging @elastic/response-ops (Team:ResponseOps) |
If I'm understanding this PR correctly, it seems like there's a |
await actionsClient.execute({ | ||
actionId: request.body.connector_id, | ||
params: { | ||
subAction: 'test', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I follow why we need to call test
here? What does test
do vs run
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically we expect the run API to be used for scheduled actions (actions run via task manager) while the test API should be used for ad hoc action executions that are run immediately and not via a scheduled task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! just to be sure: if I want a streaming response, I should use test, and there are no other implications other than that it looks weird? I've also got a followup question: what happens when we start calling more endpoints, or add more LLMs? Do we all handle that in this connector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I want a streaming response, I should use test, and there are no other implications other than that it looks weird?
Yes, that's correct! You should use the test
subaction
what happens when we start calling more endpoints, or add more LLMs
The generative AI connector is right now basically a pass through to the 3rd party APIs, so the body that you pass to the connector executor is basically the body that's passed to the API, so it's up to the calling function to format the body correctly for the connector provider. Currently, if we want to add a new endpoint for an existing provider, the connector allows you to specify the api URL in the connector config. If we want to add more LLMs, we would need to update the connector to allow for a new provider.
x-pack/plugins/stack_connectors/server/connector_types/gen_ai/lib/utils.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @ymao1, thank you very much :) just have a couple of questions
…nto gen-ai-streaming-response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little concerned about us returning the response from the service directly, so posting a review early - just made it a bit of the way through before stumbling on the pipeStreamingResponse thread ...
@@ -49,8 +58,8 @@ export class GenAiConnector extends SubActionConnector<GenAiConfig, GenAiSecrets | |||
|
|||
this.registerSubAction({ | |||
name: SUB_ACTION.TEST, | |||
method: 'runApi', | |||
schema: GenAiRunActionParamsSchema, | |||
method: 'executeApi', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a migration issue if these connectors were used in queued actions. But they aren't, so this seems fine. Though would be a problem for ZDT changes. Which we don't have to worry about yet.
x-pack/plugins/stack_connectors/server/connector_types/gen_ai/lib/utils.ts
Outdated
Show resolved
Hide resolved
}); | ||
return response.data; | ||
} | ||
|
||
public async executeApi({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about naming this method more descriptively? streamApi
? execute
seems to be a synonym for run
, and it is not immediately apparent what the difference is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just testApi
since the subaction is test
? streamApi
seems deceptive since you can request a streaming response or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it would be less confusing, I can add a new subaction for this stream functionality and leave the test subaction the same as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 09b0a32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appreciate that, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left a question about the dont-compress-this
header
|
||
export const pipeStreamingResponse = (response: AxiosResponse<IncomingMessage>) => { | ||
response.data.headers = { | ||
['Content-Type']: 'dont-compress-this', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we leaving this in? It can't be copied to the eventual outgoing request, I assume, not sure if it's a special directive to the core http framework, or what. Also, not sure that these sorts of things can't be compressed, but there do tend to be issues w/compression and streaming APIs generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iirc, it will be compressed by hapi if it recognizes the content type. A bit of a hack but it works. Maybe there's a less quirky string that we can use though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, cool. Let's add a comment in the code, as this will confuse someone in the future!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in ecec58c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
💚 Build Succeeded
Metrics [docs]Page load bundle
Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @ymao1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the enhancement. It may a good idea to also update this documentation with the new method: docs/management/connectors/action-types/gen-ai.asciidoc
. Nice work!
I'll discuss this with @lcawl and create a separate PR for the docs if necessary. Thanks! |
…sponse-stream`. (#162609) ## Summary Follow up to #161676. Since we added support for handling SSE-streams returned by OpenAI APIs to `@kbn/ml-response-stream` in #162335, this updates the "Gen AI Streaming Response" developer example to make use of the custom hook `useFetchStream()` from `@kbn/ml-response-stream` instead of its inline code to process the stream on the client. It also improves the refresh behaviour: Previously, once a prompt was entered and the accordion opened, every keystroke in the prompt textarea would cause a new request to the AI endpoint. This update adds a `Refresh prompt` to only do a new request on this user action. Support for cancelling requests when unmounting the `StreamingResponse` component was also added. ![gen-ai-streaming-0001](https://github.com/elastic/kibana/assets/230104/af81d3ac-cc03-4550-9aa5-8685c15304cd) ### Checklist - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
…sponse-stream`. (elastic#162609) ## Summary Follow up to elastic#161676. Since we added support for handling SSE-streams returned by OpenAI APIs to `@kbn/ml-response-stream` in elastic#162335, this updates the "Gen AI Streaming Response" developer example to make use of the custom hook `useFetchStream()` from `@kbn/ml-response-stream` instead of its inline code to process the stream on the client. It also improves the refresh behaviour: Previously, once a prompt was entered and the accordion opened, every keystroke in the prompt textarea would cause a new request to the AI endpoint. This update adds a `Refresh prompt` to only do a new request on this user action. Support for cancelling requests when unmounting the `StreamingResponse` component was also added. ![gen-ai-streaming-0001](https://github.com/elastic/kibana/assets/230104/af81d3ac-cc03-4550-9aa5-8685c15304cd) ### Checklist - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Resolves #159598
Summary
This PR modifies the
test
subaction of the Generative AI connector to accept astream
parameter (default:false
) that allows for a streaming response.The Generative AI connector is basically a pass-through to the Open AI/Azure OpenAI APIs, where the
stream
parameter is passed in via the body of the request. This means that with the existing connector, users could specifystream: true
in the body which would lead to unexpected results when the action is unprepared to return streaming results. This PR sanitizes the body that is passed in therun
subaction to prevent thestream
parameter from being set totrue
and explicitly sets thestream
parameter for thetest
subaction.In order to test the streaming response, I created an example plugin that prompts users to create a Generative AI connector if one does not exist and then executes actions using the connector with
stream
set totrue
. This borrows liberally from @dgieselaar's existing work from #158678Jul-12-2023.19-16-43.mp4
To Verify
Stream Response