Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debugpy gets "stuck" while using run by line in vscode jupyter notebook #1051

Closed
roblourens opened this issue Sep 9, 2022 · 11 comments
Closed
Labels

Comments

@roblourens
Copy link
Member

roblourens commented Sep 9, 2022

Before creating a new issue, please check the FAQ to see if your question is answered there.

Environment data

  • debugpy version: 1.6.3
  • OS and version: macOS
  • Python version (& distribution if applicable, e.g. Anaconda): 3.9.5
  • Using VS Code or Visual Studio: VS Code

I am working on the "run by line" feature with a jupyter notebook in vscode. There is an issue which I don't think is new. I am just clicking the "run by line" button very quickly, which starts a debug session and sends a "stepIn" request. After trying this a few times, the debugger eventually stops responding.

Basically at this point, the cell finished execution and so we have sent a "disconnect" request to debugpy, but we never get a response. The log from the extension's side shows that as I keep clicking the button to step, vscode sends "stepIn" requests that don't get a response. Here are the debugpy + vscode-jupyter logs from this point where the debugger is not responding.

debugpy.adapter-93501.log
debugpy.pydevd.93407.log
debugpy.server-93407.log
vscodejupyter.log

But there is a workaround: if the user clicks the stop button, we execute the code pass, and this seems to unstick the debugger. We added this workaround in microsoft/vscode-jupyter#7612, although I'm not sure where it came from. Once that happens, we get responses to all the requests that previously weren't getting responses, and can start another debug session if desired. Here are the logs from after executing this workaround (includes the full log of the above, then more)

vscodejupyter.log
debugpy.adapter-93501.log
debugpy.pydevd.93407.log
debugpy.server-93407.log

Can you help me understand what's going on? We try to keep things consistent on the extension's end when stopping/starting new sessions quickly. It's possible that the requests we sent are in a weird order. One thing I have noticed is that when this happens, I see an evaluate request that we sent for variables right before the final disconnect request. If the disconnect request goes out after we got a response for evaluate, then it seems like the issue doesn't happen. I don't know whether that is part of the problem or not. But in any case, it definitely seems like the kernel/debugger is in a broken state.

@roblourens
Copy link
Member Author

@int19h I'm wondering whether the tag means you know it's a bug on debugpy's end, or whether it needs more investigation, and if so, anything I can do to help?

@int19h
Copy link
Contributor

int19h commented Sep 26, 2022

The adapter logs for the failing case seem to indicate that it never even received that "disconnect" - the last message from the client has seq=101, which is the eval immediately preceding the disconnect. However, it also never receives the response to that
eval from the server, and pydevd log doesn't show success of evaluation, either.

One thing of note when that last eval request (seq=101) comes in, it specifies a "frameId", but at the same time there were several "stepIn" request issued right before without a subsequent "stackTrace" - so the "frameId" is not necessarily valid anymore. However, if the frame is gone, I'd expect things to fail gracefully with an error response, not hang indefinitely. So there's definitely a bug on debugpy end here, even if only about reporting the error.

@fabioz, is there anything else suspicious that can be gleaned from the pydevd log here?

@int19h int19h removed their assignment Sep 27, 2022
@roblourens
Copy link
Member Author

Hey @fabioz, pinging on this in case you missed it

@fabioz
Copy link
Collaborator

fabioz commented Oct 14, 2022

I did check the logs a but nothing really did stand out for me in the pydevd side.

As a note, it's entirely possible that an evaluate request sent to pydevd doesn't have a response at all (since it's all threaded, the eval request can be made but if that thread is not stopped in a breakpoint it'll just linger there as there's no explicit control to timeout such a request if the thread starts running in the meanwhile, which appears to be what's happening here -- it's possible to see the evaluate followed by the resume due to the step in, but it never stops anywhere afterwards in the logs and thus it's just running waiting to stop somewhere, which it never does due to not reaching user code).

The fact that no disconnect appears in the jupyter log but not in the debugpy logs probably means something bad happened at that point (and I'd say that's the probable culprit).

Given that other messages are received by debugpy afterwards, my best guess is that after logging the disconnect in jupyter it didn't really send the message to debugpy and the client gets into a weird situation (but I don't know why that'd happen -- the server logs seem to correctly reflect what it receives, the one point where there's a divergence is that the jupyter logs show a disconnect and that's never received in the debugger and from following the code in debugpy I don't see any way for it to receive something in the socket and don't log it unless the connection is broken -- which is not the case as other messages are received properly afterwards)...

The difference from the case where it's responding is that the disconnect request was indeed received (and thus any request afterwards should be answered with a client not authenticated as appears in that log until the attach is done again).

One possible culprit (and now I'm just making things up because I haven't even checked the client) is that the client is waiting for a response to all messages before sending the disconnect and that wouldn't work because it will never receive a response from that evaluate...

@roblourens
Copy link
Member Author

One possible culprit (and now I'm just making things up because I haven't even checked the client) is that the client is waiting for a response to all messages before sending the disconnect and that wouldn't work because it will never receive a response from that evaluate...

Assuming by "client" you mean vscode and the Jupyter extension, that shouldn't happen- all the messages are independent. Or do you mean ipykernel/jupyter?

Would it narrow it down at all to run this without the evaluate requests and see whether the issue still comes up in CI?

@roblourens
Copy link
Member Author

My takeaway from the comments above is that you don't feel there's anything actionable on the debugpy/pydevd sides with the info you currently have, is that fair? If so, I can work on this to try to narrow it down some more.

This comes up pretty easily in my tests in CI, so if you can think of anything that would be helpful to log, I can run tests against a custom build with extra logging.

I can also try some different things on the extension's side that might give more info... I can remove the evaluate requests (lately we've also been looking at how these requests that originate from the variables view can be slow to process or produce large responses to pass around) if that gets us anywhere, I can try to create a minimal repro based on that. Does that make sense? Anything else I can try?

@fabioz
Copy link
Collaborator

fabioz commented Nov 3, 2022

My takeaway from the comments above is that you don't feel there's anything actionable on the debugpy/pydevd sides with the info you currently have, is that fair?

Well, I'd say that the issue is in the communication between jupyter and debugpy, as the issue seems to be that a message (logged in jupyter) is not received by debugpy (at least it's not in its logs). I did some review in the code in debugpy and I find it unlikely that a received message is not logged (but this doesn't exclude issues in the socket itself -- maybe the socket wasn't setup properly and is missing some flag and there's some buffering issue and the message isn't being received).

If there was some standalone repro that'd be really nice...

@roblourens
Copy link
Member Author

Got it, I will dig into this a bit more this month.

@roblourens
Copy link
Member Author

Sure enough, running tests without the variables view sending the evaluate requests seems to make the issue go away. I still want to come up with a consistent repro though.

@judej
Copy link

judej commented Jan 19, 2023

Waiting for Jupyter investigation

@github-actions
Copy link

This issue was closed because it has been stalled for 30 days with no activity. If the issue still persists, please reopen with the information requested. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants