Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to stop message streaming #1022

Merged
merged 12 commits into from
Oct 16, 2024

Conversation

krassowski
Copy link
Member

@krassowski krassowski commented Oct 9, 2024

stop-streaming

This PR not only stops the stream from being sent to the user, but also propagates the information up to the provider/model making to possible for custom deployments to actually kill the generating process and thus safe cost.

@krassowski krassowski added the enhancement New feature or request label Oct 9, 2024
@michaelchia
Copy link
Collaborator

does this remove the ability to send multiple messages before receiving a reply? i personally don't think sending multiple messages is useful but i just want to understand if that is a change.

also creating a new chat or deleting the message while it is streaming should similarly stop the stream. currently it stops it from showing in the UI but does not interupt the provider. may not need to be within the same PR but something that could be added eventually.

@krassowski
Copy link
Member Author

does this remove the ability to send multiple messages before receiving a reply? i personally don't think sending multiple messages is useful but i just want to understand if that is a change.

Great question! Technically it does not - it is still possible to send multiple messages using Enter or Shift + Enter:

image

@michaelchia
Copy link
Collaborator

so looks like the button will stop all pending messages?

@krassowski
Copy link
Member Author

so looks like the button will stop all pending messages?

Currently yes. What do you think it should do in that case? I think user expectations are loosely defined (if any) because other applications often do not support concurrent messages.

@michaelchia
Copy link
Collaborator

michaelchia commented Oct 9, 2024

i don't have any strong opinions on this, stopping all pending messages seems fine.

any idea on how llm memory is affected? if GenerationInterrupted is not handled (default), can i assume the llm chain will just continue to run? if that is the case, the llm memory would contain the fully completed AIMessage while the UI will show the partial message. if the chain is actually interrupted, the exchange (both HumanMessage and AIMessage pair) would not be added to memory.

i am not sure which would be preferable in this case. i would lean toward the former

EDIT: i am actually not sure what the default behaviour would be. will need to look into RunnableWithMessageHistory to figure it out.

@krassowski
Copy link
Member Author

any idea on how llm memory is affected? if GenerationInterrupted is not handled (default), can i assume the llm chain will just continue to run? if that is the case, the llm memory would contain the fully completed AIMessage while the UI will show the partial message. if the chain is actually interrupted, the exchange (both HumanMessage and AIMessage pair) would not be added to memory.

No, I think it should not contain the complete message. I think it is stateless with RunnableWithMessageHistory taking care of the history provisioning.

also creating a new chat or deleting the message while it is streaming should similarly stop the stream. currently it stops it from showing in the UI but does not interupt the provider

Done in 1d8b3f9

@krassowski
Copy link
Member Author

i don't have any strong opinions on this, stopping all pending messages seems fine.

I think so, it feels that the main button should act like "stop all" but maybe there should be a menu entry under three dots next to each message allowing to stop streaming of specific messages. I think that should be implemented in a future PR.

@krassowski krassowski changed the title Allow to stop streaming of agent messages Allow to stop streaming agent messages Oct 9, 2024
Copy link
Member

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krassowski Thank you for opening this PR and for documenting your original issue so clearly! It is really helpful to see examples from other AI assistant applications showing the UX you're proposing here.

Jupyter AI is intended to support collaboration, and has integrations with jupyter-collaboration to allow for multiple users to share the same chat UI and use the same configured LLM. My concerns with this PR are that:

  1. Whether the stop button is shown is determined by if any streaming messages are incomplete, and
  2. Clicking the "Stop" button halts streaming on all replies.

I consider these concerns as blockers, because these to two consequences:

  1. When one user sends a message, it blocks all other users from sending one through the UI until the response is finished streaming. (yes, they can use the keyboard, but the UI suggests they can't send a message)
  2. Users can halt streaming of responses addressed to other users.

I suggest that we re-define how stopping messages works. Specifically:

  • Whether the "Stop" button is shown should be based on whether the response to the user's last question is still streaming.
  • Pressing the "Stop" button should only stop streaming of the last message asked by that user.
  • If a user rapidly sends 2 messages in serial, pressing the "Stop" button only needs to stop the last message from streaming.
    • This is an engineering shortcut I recommend taking so we don't need to store and manage an array of all the agent responses that are still streaming; we just need a reference to the most recent one.
    • I think this is fine given that there's not much of a use-case for a single user to be sending multiple messages. A user with multiple questions would expect to just ask them all at once in a single message.
  • StopRequest should be modified to include an id field, which should contain a agent message ID. On receipt, the backend should stop only that message from streaming further.

Here are some extra details about the codebase that I thought would help with this proposed implementation:

  • You can modify props.onSend() in send-button.tsx to return the message ID of the human message that was just sent. This is defined in chat-input.tsx, and calls ChatHandler.sendMessage(), which returns a Promise that resolves to the message ID of the HumanChatMessage the user just created by sending the message.

  • ChatHandler.replyFor() accepts a human message ID and returns a Promise that resolves to the AgentChatMessage object that is replying to that human message. This method only works when not streaming, and is unused in the codebase today. You can modify the implementation to return a promise that resolves to an AgentStreamMessage object as soon as the stream starts. It should be as easy as changing this:

    // resolve promise from `replyFor()` if it exists
    if (
      newMessage.type === 'agent' &&
      newMessage.reply_to in this._replyForResolverDict
    ) {

    to:

    // resolve promise from `replyFor()` if it exists
    if (
      (newMessage.type === 'agent' || newMessage.type === 'agent-stream') &&
      newMessage.reply_to in this._replyForResolverDict
    ) {

    Then, update the relevant types from AgentChatMessage to AgentChatMessage | AgentStreamMessage.

Let me know your thoughts on the changes I'm proposing above. Again, thanks for contributing this!

@dlqqq
Copy link
Member

dlqqq commented Oct 10, 2024

One extra comment: The "Stop" button should revert to a "Send" button if the user types while the previous response is still streaming. This preserves the current UX for stopping a stream, while also having the UI correctly reflect that it is still possible to send a message while a response is still streaming.

@krassowski
Copy link
Member Author

Thank you for the feedback @dlqqq, this should be ready for another round of review :)

@brichet brichet mentioned this pull request Oct 11, 2024
11 tasks
Copy link
Member

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krassowski The new revision of this PR looks fantastic! I left a couple of minor callouts and also would like to add additional changes to this PR to minimize the performance impact. Since the additional changes I'm proposing can be lengthy, I think it would be best to open this as a separate PR, possibly targeting your fork's branch. See below.

packages/jupyter-ai/src/components/chat.tsx Outdated Show resolved Hide resolved
packages/jupyter-ai/jupyter_ai/handlers.py Outdated Show resolved Hide resolved
@dlqqq
Copy link
Member

dlqqq commented Oct 15, 2024

@krassowski I pushed a couple of commits that:

  • Changes StopRequest to stop streaming all messages from the current user, without requiring a target field.
    • After looking at the code more, I realized that my original suggestion of only stopping one message-at-a-time was not the best approach (sorry!). What's actually important is that only messages from the user who issued the stop request are stopped. This PR is similar to the original one you opened, except the backend only interrupts the messages from the current user based on self.current_user.username.
  • Dropped StopMessage model and removed interrupted: boolean from AgentStreamMessage's frontend model, as they were both unused. These can be easily re-implemented in a future PR that requires them.
  • Adds a message tombstone to interrupted messages for the purposes of future chat memory passed to the LLM. A partial response in the history could confuse the LLM, so interrupted messages now always end in (AI response stopped by user).

Here's a demo video:

Screen.Recording.2024-10-15.at.1.08.32.PM.mov

This PR looks good to me as-is (no need to split this PR if you're too busy to do so). However, I'll leave this PR open for your feedback or additional code changes.

(p.s. In the future, if I have more complex suggestions, I will be more inclined to push to your branch and ask for your feedback. It would be more respectful of your time for me to write the code myself and be accountable for my suggestions working/not working.)

@dlqqq dlqqq changed the title Allow to stop streaming agent messages Allow users to stop message streaming Oct 15, 2024
@krassowski
Copy link
Member Author

Dropped StopMessage model and removed interrupted: boolean from AgentStreamMessage's frontend model, as they were both unused. These can be easily re-implemented in a future PR that requires them.

Agreed, this was not necessary. I was thinking about adding some kind of UI indicator for interrupted messages (but most other chat interfaces do not have one) and also about allowing the telemetry to get that info but since you added the tombstone this is not a dire need.

@krassowski
Copy link
Member Author

Looks good to me. I am not a fan of "here" in the streamingReplyHere name; I know what you want to express but do not have better suggestions so I guess it stays.

I opened PRs for the sub-tasks:

Copy link
Member

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you @krassowski! Really looking forward to using this feature myself. 🎉

After this, I need to lift some of the more complex streaming logic out of DefaultChatHandler into BaseChatHandler so authors of custom slash commands don't need to constantly duplicate our implementation every minor release. With this PR as-is, the stop button would show but not work for custom slash commands that implement streaming.

I'm targeting this feature to become available in a minor release next Monday (10/21). 💪

@dlqqq dlqqq enabled auto-merge (squash) October 16, 2024 15:54
@dlqqq dlqqq merged commit 1eb3b55 into jupyterlab:main Oct 16, 2024
10 checks passed
@krassowski krassowski deleted the feat/stop-streaming branch October 16, 2024 16:40
@krassowski
Copy link
Member Author

Thanks for the help with review @dlqqq! I agree it would be great if streaming/stopping streaming worked for slash commands too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check for collaborative extension needs updating to work with v3 Allow to interrupt/stop streaming
3 participants