Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle EH restart gracefully to avoid notebook working copy data loss #179224

Closed
rebornix opened this issue Apr 5, 2023 · 11 comments
Closed

Handle EH restart gracefully to avoid notebook working copy data loss #179224

rebornix opened this issue Apr 5, 2023 · 11 comments
Assignees
Milestone

Comments

@rebornix
Copy link
Member

rebornix commented Apr 5, 2023

Does this issue occur when all extensions are disabled?: Yes/No

  • VS Code Version: 1.78 Insiders
  • OS Version: macOS 13.3

Over the past year, we've seen dozens of users report that they can't save notebooks or get the error "Can't read properties of undefined". We also got similar failures/errors from error telemetry or flaky integration tests. This issue in Jupyter microsoft/vscode-jupyter#11435 gave us good hints of what lead to this issue: EH crashed or restarted when there was a dirty notebook document in editors and things fell apart.

What happened under the hood

  • Users have a dirty ipynb document
  • EH crashed (in above issue, it was caused by Copilot both last year and last month)
  • All extHostCustomers were disposed
  • If users chose to save, it would fail as the NotebookSerializer was disposed. The NotebookSerializer instance held a rpc proxy which was also disposed. There is no way to send the requests to anywhere to serialize the notebook document to bytes.
  • If users chose to cancel
    • They can still make changes to the document
    • When users switch to another editor and then come back, the editor will throw an error like "Can't read properties of undefined", this is because

Now users can't save the document, they can't restart the window either as the dirty notebook document can't be back'ed up. Their last option is discard the change. Even if the EH was restarted successfully, since the NotebookFileWorkingCopyModel was still using the legacy disposed NotebookSerializer, they can't use the new NotebookSerializer registered for jupyter-notebook view type.

Proposals

The main challenge here is most notebook serializers (the interactive window one is the only exception now) run in EH, if the EH is down, there isn't anything we can do as we rely on the serializer to do the data conversion. Users can potentially separate extensions to different EH instances to mitigate the issue (separate good and bad extensions) but usually when this happened, it was usually too late.

With that said, we should still explore what we can do to improve this, or minimize the impact. Some ideas from my explorations:

  • When EH crashed/disposed, wait for some threshold before attempting to dispose NotebookEditorInput (which triggers Save) as we know it won't succeed. If the EH comes back within a reasonable threshold, re-establish the serializer to the NotebookFileWorkingCopy so it can use the new serializer to save.
    • With this approach, we still need to figure out what we should do if the EH is completely down, how do we ensure we still have hot-exit/backups
  • Or, for ipynb users (our major notebook users, > 1M MAU), we move the serializer to the core. Once we have the scratchbook working copy support, we will remove the interactive serializer and use ipynb serializer for Interactive Window.
    • It doesn't affect our bundle size and we can replace the Interactive Window notebook serializer with it. However we didn't solve this problem for other notebook users

@bpasero I'd like to hear your opinions on this and maybe brainstorm a bit on how we can improve this situation.

@bpasero
Copy link
Member

bpasero commented Apr 5, 2023

The user experience is really really bad when this happens, I am glad we want to do something about it 👍

@bpasero
Copy link
Member

bpasero commented Apr 5, 2023

From our meeting, some action items:

  • can we stop disposing notebook models and working copies when view type goes away
  • can we reconnect notebooks to the new extension host when it restarts (from a crash, or e.g. profile change)
  • can we show a button (here [1]) to the user when opening a notebook editor fails because the extension is missing to ask to install the extension
  • can we offer a button (here [1]) to the user for a dirty notebook that fails to show to get the backup data into an untitled editor so that data loss is somewhat prevented

[1]
image

@bpasero bpasero added the debt Code quality issues label Apr 5, 2023
@rebornix
Copy link
Member Author

rebornix commented Apr 5, 2023

can we stop disposing notebook models and working copies when view type goes away
can we reconnect notebooks to the new extension host when it restarts (from a crash, or e.g. profile change)

Did a bit of experiments for these two and it's actually working great, as long as we don't hold a reference to notebook serializer object in the editor models, each time on save we just look up in the serializers map again for the latest serializer for a specific view type. Will experiment if we should always look up or we hold a ref but listen to serializer update events, will see.

Screen.Recording.2023-04-05.at.2.31.33.PM.mov

@bpasero
Copy link
Member

bpasero commented Apr 6, 2023

Awesome 👏 !

@bpasero
Copy link
Member

bpasero commented Apr 6, 2023

One feedback from @alexdima was to also look into the scenario of disabling the notebook extension with a dirty notebook editor:

  • have a dirty notebook
  • disable the extension for that notebook
  • reload window
  • 🐛 changes are lost

Ideally the backup is held around for until we can show it in an editor and not remove it. I think today it gets dropped because the notebook editor opens but then fails to show the contents.

@rebornix
Copy link
Member Author

What we have archived so far:

Editor Open

  • When switching to another profile, or restarting EH
    • ✅ dirty working copies are not disposed anymore
  • Once the new EH is alive and users try to open this dirty notebook document
    • ✅ If the extension still exists and is enabled, it can be opened as expected
    • ✅ If the extension is not enabled, or doesn't exist anymore, we will show a button to install/enable the extension

In both cases, the working copy is not lost

Install missing extension from editor open

Screen.Recording.2023-04-12.at.10.41.43.AM.mov

Enable disabled extension from editor open

Screen.Recording.2023-04-12.at.10.42.36.AM.mov

Editor Save

If the serializer extension is disabled or doesn't exist when users attempt to save, we can now show an action in the prompt. The work is happening in #179727

Screen.Recording.2023-04-11.at.3.18.24.PM.mov

More to improve

  • If users attempt to switch to a new profile when there is still dirty working copy
    • We should veto the profile switching, ensure the working copies are all saved first cc @sandy081
    • Once switched to new profile, if the serializer for the notebook working copy doesn't exist, we can show a dialog asking users to install the missing serializer
      • Maybe this should be part of the profile switching process, if users are using custom/notebook editors in current profile, ask them if they want the custom editor / serializer extensions in the new profile. If they don't, save and close all those working copies before profile switching.
  • If we run into a corner case where we have a dirty working copy which can't be opened, we should show an additional action in the place holder editor to open the backup in an untitled editor

While testing all above scenarios, it seems working copies are kept if we don't intentionally dispose them in notebook, but I'm not sure if it's guaranteed, we might want to take a look together @bpasero in case I miss anything. End goal is no dirty working copy should be disposed by any means without user consent.

@bpasero
Copy link
Member

bpasero commented Apr 15, 2023

This is 🚀 👏 cool work, thanks for pushing this forward.

I wonder if we should also not dispose a notebook when it is not dirty. Because now the effect of a crashing extension host will be that non-dirty notebooks will just close at random. I would go ahead and simply never dispose notebooks when a view type goes away.

We should veto the profile switching

I think this needs to be level below: we need participation and veto support for EH restarts (except for when it crashes where we have no control). Because there are other reasons to restart the EH, for example entering a multi-root workspace.

End goal is no dirty working copy should be disposed by any means without user consent.

To clarify: backups are meant to be kept unless we were able to open them in an editor (code pointer). We have to double check here if that works as advertised 👍

we should show an additional action in the place holder editor to open the backup in an untitled editor

That is still on my list to investigate.

@bpasero
Copy link
Member

bpasero commented Apr 18, 2023

#180184 keeps notebooks open even when not dirty.

#180198 will preserve notebook backups until we can restore them. There was an issue where we would eagerly discard backups even when no matching extension was installed.

@bpasero
Copy link
Member

bpasero commented Apr 18, 2023

#180204 offers an action to open the notebook as text file and will open as untitled if a backup is present showing that backup:
Recording 2023-04-18 at 11 50 57 (1)

bpasero added a commit that referenced this issue Apr 18, 2023
…179224) (#180204)

* notebooks - offer a way to open backup as text in case of failure

* even open when no backup is there
bpasero added a commit that referenced this issue Apr 19, 2023
* Cache view type provider for missing extension restore.

* Re #179224. Customize action for save error.

* polish

* No delay for persisting view type memento

* Use log service instead of console

* dispose listener if extension installation/enablement fails

---------

Co-authored-by: Benjamin Pasero <benjamin.pasero@gmail.com>
@rebornix
Copy link
Member Author

The only work left is #180514 , closing this issue.

@bpasero
Copy link
Member

bpasero commented Apr 22, 2023

👍 , should probably have a test plan item for this work

@github-actions github-actions bot locked and limited conversation to collaborators Jun 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants