-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent failure with Data Explorer smoke tests #4663
Comments
### Intent This test has recently become flakey due to [this](#4663) known issue. In the mean time, we are disabling it to prevent blocking of pipelines. ### QA Notes N/A
A possible fix is that on the Positron side we let through messages that don't have the result field, is there a consistent way to detect whether it's a frontend method being called with |
AFAICT, everything that is not an RPC reply is treated as an event, ie, there's no strong definition of what an 'event' is, see eg: positron/src/vs/workbench/api/browser/positron/mainThreadLanguageRuntime.ts Lines 991 to 1000 in 39ceabc
positron/extensions/jupyter-adapter/src/RuntimeClientAdapter.ts Lines 263 to 282 in 4a2e8e3
|
Here I think we would want to add a check that the message contains a result field, to know whether it is actually a response to an RPC as opposed to an event that was sent while the RPC response was pending |
Speculative change to address #4663. As noted in that issue, this is mostly a workaround for an issue that exists in the Python kernel wherein comm messages emitted always have the parent ID of the most recent message to the shell (even if they are not a reply to that message). This can cause problems in Positron since it assumes that, after a message is sent, the next message with a matching `parent_id` is the reply to that message. The fix is to check a message for `result` or `error` keys before treating it as an RPC response. We can't do this for _all_ messages since many codepaths/comms do not use the JSON-RPC structure, but we do it for all formally defined Positron comms. This is admittedly a little janky. Some things that could make this better (but are higher risk than this change): - a more robust way of identifying Positron's JSON-RPC comm messages - having Positron supply its own ID with each request that must be returned in the reply body in order to deterministically pair requests and replies (i.e. don't rely on `parent_id` since it can be hard to control) - moving Positron comms off the shell socket (@lionel- has suggested e.g. moving them to the control socket to allow them to operate independently of the current shell command) ### QA Notes This change is small but it's a hot codepath -- almost every comm message goes through here. Sanity test the data explorer, and ipywidgets. They use this codepath but specifically do not want the new behavior added here.
Speculative change to address #4663. As noted in that issue, this is mostly a workaround for an issue that exists in the Python kernel wherein comm messages emitted always have the parent ID of the most recent message to the shell (even if they are not a reply to that message). This can cause problems in Positron since it assumes that, after a message is sent, the next message with a matching `parent_id` is the reply to that message. The fix is to check a message for `result` or `error` keys before treating it as an RPC response. We can't do this for _all_ messages since many codepaths/comms do not use the JSON-RPC structure, but we do it for all formally defined Positron comms. This is admittedly a little janky. Some things that could make this better (but are higher risk than this change): - a more robust way of identifying Positron's JSON-RPC comm messages - having Positron supply its own ID with each request that must be returned in the reply body in order to deterministically pair requests and replies (i.e. don't rely on `parent_id` since it can be hard to control) - moving Positron comms off the shell socket (@lionel- has suggested e.g. moving them to the control socket to allow them to operate independently of the current shell command) ### QA Notes This change is small but it's a hot codepath -- almost every comm message goes through here. Sanity test the data explorer, and ipywidgets. They use this codepath but specifically do not want the new behavior added here.
System details:
Currently only reproducible on CI with commits after merging #4326.
Positron and OS details:
Interpreter details:
Python/Polars
Describe the issue:
Since merging #4326 we're seeing intermittend failures with the data explorer smoke tests. Most of the times, it happened while openining a very small Polars data frame. Resulting in:
From the console logs we see:
This suggests that the
return_column_profiles
event was intepreted as an RPC response, thus causing 1. theget_column_profiles
RPC call to fail. 2. The results are not treated as an event, and thus, they are not correctly forwarded to the renderer.This could happen when the
return_column_profiles
message is sent before we actually sent the reply toget_column_profiles
RPC. We added a mechanism in the Python code to prevent this to happen in 14ae1ad, and while it seems to prevent messages from being sent out of order. However, unless we're missing something, my hypothesis is that there's no guarantee that Positron will process them in the correct order.The root cause of this issue is that when the ipykernel sends a
comm.send_event()
, it actually adds the parent headers of the last request - which is technically wrong - events should not have parents message ids. It's possible to change this behavior, but requires a good amount of work.So AFAICT what's happening is that the Jupyter adapter forwards the event message to the RPC handler:
positron/extensions/jupyter-adapter/src/RuntimeClientAdapter.ts
Line 263 in 4a2e8e3
And causes the RPC error that's unexpected by the data explorer.
Steps to reproduce the issue:
Only seen on CI so far. Se attached artifacts.
run-artifacts-13.zip
Expected or desired behavior:
No failures, data should be displayed
Were there any error messages in the UI, Output panel, or Developer Tools console?
See above
The text was updated successfully, but these errors were encountered: