Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pub with xpub proposal #65

Merged
merged 2 commits into from
Dec 24, 2021
Merged

Replace pub with xpub proposal #65

merged 2 commits into from
Dec 24, 2021

Conversation

JohanMabille
Copy link
Member

@JohanMabille JohanMabille commented Jan 5, 2021

Replace PUB socket with XPUB socket

Problem

A Jupyter kernel uses a PUB socket as a broadcast channel where it publishes various messages (incoming requests, outputs, events). Any client that wants to be notified of what happens in the kernel can simply subscribe to this channel via a SUB socket. The issue with this simple PUB - SUB pattern is that there is no simple mechanism for clients to wait for their iopub subscription to be established.

This is particularly problematic when a client needs to send a bunch of execution requests just after starting a kernel (a typical use case is the "Restart Kernel and execute all" command of Jupyter Lab, or opening a Notebook with Voilà). In that case, the client needs to ensure that its SUB socket is connected to the PUB socket of the kernel before sending any execution request, otherwise it may miss some outputs. The kernel, on its side, will not emit any event until it receives a request.

Current solution

The current solution consists of sending kernel_info requests until a kernel_info reply is received on the SHELL and an idle status message is received on the SUB socket. If the client receives a kernel_info reply but not the idle status message before a timeout, this means that the SUB socket has not connected yet. The client discards the reply and send a new kernel_info request.

This solution makes the implementation of a Jupyter client more complicated than required. Indeed, ZeroMQ provides a publisher socket that is able to detect new subscriptions.

Proposed Enhancement

We propose to replace the PUB socket of the kernel with an XPUB socket. XPUB sockets receive the following events:

  • subscribe (single frame messages with \1{subscription-topic})
  • unsubscribe (single frame messages with \0{subscription-topic})

When the IOPub XPUB socket receives an event indicating a new subscription, it shall send an iopub_welcome message with no parent header and the received
subscription in the subscription field, with the given subscription topic:

identity_prefix: ["subscription-topic"]
parent_header: {}
header: {
    "msg_type": "iopub_welcome"
}
content: {
    "subscription": "subscription-topic"
}
metadata: {}

Notes:

  • The identity prefix may contain extra information about the message, such as the kernel id, according to the convention used in the IPython kernel:
    kernel.{u-u-i-d}.subscription-topic.
  • Subscriptions can be empty (this is almost always the case). An empty subscription means subscribing to all IOPub messages.
  • Subscriptions shall be UTF-8 encoded. Non-utf8 subscriptions shall be ignored, and not produce a welcome message.
  • Every subscribed client will receive every other client's welcome message, assuming they match existing subscriptions (e.g. empty strings)
  • Welcome messages do not and cannot identify the client whose subscription is being received. Receipt of an iopub_welcome message with your subscription does not mean it is in response to your own subscription. However, receiving a message does mean that a matching subscription has been registered for your client, otherwise no message will be received. So if only one subscription is registered, as is normally the case, receiving any welcome message is sufficient to indicate that your client's subscription is fully established. The gist is that receiving a welcome message is a sufficient condition to establish the subscription-propagation event, and additional welcome messages should be expected and ignored.
  • Unsubscribe events are ignored.

Impact on existing implementations

Although this enhancement requires changing all the existing kernels, the impact should be limited. Indeed, most of the kernels are based on the kernel wrapper approach, or on xeus.

Regarding clients, this change is backward-incompatible only when they start assuming a welcome message will come. Clients not aware of
the new message will just get an unrecognized iopub message they can safely ignore.

Most of the clients are based on the Jupyter Server. Therefore, the changes should be limited to this repository and to the notebook. The new
recommended starting procedure is:

  1. always probe with a single kernel_info_request (required for protocol adaptation)
  2. check protocol version in reply
    i. if welcome message is supported, wait for it on iopub
    ii. if not:
    a. use current request & wait loop (recommended, especially for current clients that already have this implemented)
    b. accept and warn about possible loss of early iopub messages for 'old' kernels ((increasingly common over time as protocol updates can be more safely assumed, especially for new clients

Relevant Resources (GitHub repositories, Issues, PRs)

GitHub repositories

GitHub Issues

GitHub Pull Requests

cc @SylvainCorlay @jtpio @martinRenou @maartenbreddels @minrk

EDIT: incorporated feedbacks from @minrk

@SylvainCorlay
Copy link
Member

Thanks @JohanMabille I think this will be a significant improvement!

@Carreau
Copy link
Member

Carreau commented Jan 6, 2021

I have no objections.

Do we know of any case where a kernel is running without a client, like embedded where you might connect/disconnect every now and then to monitor? Would changing to XPUB block the application until there is a connection ?

@JohanMabille
Copy link
Member Author

Do we know of any case where a kernel is running without a client, like embedded where you might connect/disconnect every now and then to monitor

I'm not aware of such a use case.

Would changing to XPUB block the application until there is a connection ?

It won't. XPUB is really similar to PUB, the only difference is that it can detect new subscription / unsubscription and notify the kernel. If you publish message on XPUB while no subscriber has connected, the message is simply discarded.

@SylvainCorlay
Copy link
Member

Ping @minrk @blink1073 @ellisonbg @fperez @Carreau!

We would love to have some feedback on this, as it would greatly simplify the handling of kernels.

@minrk
Copy link
Member

minrk commented Feb 25, 2021

I think this is a great idea. Since PUB is really a special case of XPUB where subscription messages are handled internally, the only backward-incompatible part of this proposal happens when clients start assuming a welcome message will come. Everything else will behave the same with no changes on the kernel or client. Clients not aware of the new message will just get an unrecognized iopub message they can safely ignore.

That means that clients can still use a kernel_info request to probe the protocol version to determine if they should expect a welcome message or have to fallback on the current probe & wait loop. There's no downside to kernels migrating promptly, and clients can be conservative about what they do wrt backward-compatibility. I would recommend, for client that want to wait for subscriptions to be ready:

  1. always probe with a single kernel_info_request (we've always needed this for protocol adaptation)
  2. check protocol version in reply
    1. if welcome message supported, wait for it on iopub
    2. if not:
      1. use current request & wait loop (recommended, especially for current clients that already have this implemented)
      2. accept and warn about possible loss of early iopub messages for 'old' kernels (increasingly common over time as protocol updates can be more safely assumed, especially for new clients)

@blink1073
Copy link
Contributor

This proposal looks great to me, pending incorporation of @minrk's feedback.

jupyter-xpub/jupyter-xpub.md Outdated Show resolved Hide resolved
jupyter-xpub/jupyter-xpub.md Outdated Show resolved Hide resolved
jupyter-xpub/jupyter-xpub.md Outdated Show resolved Hide resolved
@ellisonbg
Copy link
Contributor

ellisonbg commented Feb 26, 2021 via email

@minrk minrk self-requested a review March 1, 2021 15:16
Notes:

- The identity prefix may contain extra information about the message, such as the kernel id, according to the convention used in the IPython kernel:
```kernel.{u-u-i-d}.subscription-topic```.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, we should probably consider adopting a spec for IOPub message topic prefixes. I don't think anybody really cares and just subscribes to all, but this is still an unspecified part of the protocol other than this one new message.

Copy link
Contributor

@MSeal MSeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @minrk covered potential feedback I'd of had better than I would have done and I generally agree with his assessments and recommendations here.

@JohanMabille
Copy link
Member Author

JohanMabille commented Dec 3, 2021

Is there any blocker preventing this JEP from being merged?

@afshin
Copy link
Member

afshin commented Dec 17, 2021

Sorry for the lack of attention @JohanMabille!

@jupyter/steeringcouncil If there are any aspects of this JEP that need to be reconsidered, please take a look and comment.

@afshin
Copy link
Member

afshin commented Dec 18, 2021

The coming week is the final work week of 2021. This JEP was opened in January of this year.

I propose we merge this JEP on Friday 24 December or earlier. If there are any showstoppers, please comment before then. Thanks, everyone!

Thank you and apologies to Johan for your patience and perseverance.

lionel- added a commit to posit-dev/ark that referenced this pull request Oct 10, 2024
Closes #569

This PR fixes a race condition regarding subscriptions to IOPub that causes clients to miss IOPub messages:

- On startup a client connects to the server sockets of a kernel.
- The client sends a request on Shell.
- The kernel starts processing the request and emits busy on IOPub.

If the client hasn't been able to fully subscribe to IOPub, messages can be lost, in particular the Busy message that encloses the request output.

On the Positron side we fixed it by sending kernel-info requests in a loop until we get a Ready message on IOPub. This signals Positron that the kernel is fully connected and in the Ready state: posit-dev/positron#2207. We haven't implemented a similar fix in our dummy clients for integration tests and we believe this is what is causing the race condition described in #569.

As noted in posit-dev/positron#2207, there is an accepted JEP proposal (JEP 65) that aims at solving this problem by switching to XPUB.

https://jupyter.org/enhancement-proposals/65-jupyter-xpub/jupyter-xpub.html
jupyter/enhancement-proposals#65

The XPUB socket allows the server to get notified of all new subscriptions. A message of type `iopub_welcome` is sent to all connected clients. They should generally ignore it but clients that have just started up can use it as a cue that IOPub is correctly connected and that they won't miss any output from that point on.

Approach:

The subscription notification comes in as a message on the IOPub socket. This is problematic because the IOPub thread now needs to listens to its crossbeam channel and to the 0MQ socket at the same time, which isn't possible without resorting to timeout polling. So we use the same approach and infrastructure that we implemented in #58 for listeing to both input replies on the StdIn socket and interrupt notifications on a crossbeam channel. The forwarding thread now owns the IOPub socket and listens for subscription notifications and fowrards IOPub messages coming from the kernel components.

---

* Start moving IOPub messages to forwarding thread

* Remove unused import

* Resolve the roundabout `Message` problem

The solution was to move the conversion to `JupyterMessage<T>` up into the match, so we "know" what `T` is!

* Use correct `Welcome` `MessageType`

* Implement `SubscriptionMessage` support and switch to `XPUB`

* The `Welcome` message doesn't come from ark

* Use `amalthea::Result`

* Add more comments

---------


Co-authored-by: Davis Vaughan <davis@posit.co>
Co-authored-by: Lionel Henry <lionel@posit.co>
lionel- added a commit to posit-dev/ark that referenced this pull request Oct 10, 2024
Closes #569

This PR fixes a race condition regarding subscriptions to IOPub that causes clients to miss IOPub messages:

- On startup a client connects to the server sockets of a kernel.
- The client sends a request on Shell.
- The kernel starts processing the request and emits busy on IOPub.

If the client hasn't been able to fully subscribe to IOPub, messages can be lost, in particular the Busy message that encloses the request output.

On the Positron side we fixed it by sending kernel-info requests in a loop until we get a Ready message on IOPub. This signals Positron that the kernel is fully connected and in the Ready state: posit-dev/positron#2207. We haven't implemented a similar fix in our dummy clients for integration tests and we believe this is what is causing the race condition described in #569.

As noted in posit-dev/positron#2207, there is an accepted JEP proposal (JEP 65) that aims at solving this problem by switching to XPUB.

https://jupyter.org/enhancement-proposals/65-jupyter-xpub/jupyter-xpub.html
jupyter/enhancement-proposals#65

The XPUB socket allows the server to get notified of all new subscriptions. A message of type `iopub_welcome` is sent to all connected clients. They should generally ignore it but clients that have just started up can use it as a cue that IOPub is correctly connected and that they won't miss any output from that point on.

Approach:

The subscription notification comes in as a message on the IOPub socket. This is problematic because the IOPub thread now needs to listens to its crossbeam channel and to the 0MQ socket at the same time, which isn't possible without resorting to timeout polling. So we use the same approach and infrastructure that we implemented in #58 for listeing to both input replies on the StdIn socket and interrupt notifications on a crossbeam channel. The forwarding thread now owns the IOPub socket and listens for subscription notifications and fowrards IOPub messages coming from the kernel components.

---

* Start moving IOPub messages to forwarding thread

* Remove unused import

* Resolve the roundabout `Message` problem

The solution was to move the conversion to `JupyterMessage<T>` up into the match, so we "know" what `T` is!

* Use correct `Welcome` `MessageType`

* Implement `SubscriptionMessage` support and switch to `XPUB`

* The `Welcome` message doesn't come from ark

* Use `amalthea::Result`

* Add more comments

---------

Co-authored-by: Davis Vaughan <davis@posit.co>
Co-authored-by: Lionel Henry <lionel@posit.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants