Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Worker to serialize Notebook #226632

Merged
merged 16 commits into from
Sep 12, 2024

Conversation

nojaf
Copy link
Contributor

@nojaf nojaf commented Aug 26, 2024

Related to #214040

Hi @amunger, I tried running the JSON.stringify call on a Worker thread. It works on my machine and seems to improve the situation, but it's hard to say if the issue is completely resolved. Any thoughts?

@nojaf
Copy link
Contributor Author

nojaf commented Aug 27, 2024

@microsoft-github-policy-service agree company=G-Research

@amunger
Copy link
Contributor

amunger commented Aug 27, 2024

nice! I should be able to give this a try at the end of the week.
We would also want to put this behind a setting so we can roll it out with an experiment if it works as expected.

@nojaf
Copy link
Contributor Author

nojaf commented Aug 27, 2024

@amunger, I'm not entirely sure about the risks of using a worker for this in Node or how stable it is. It likely depends on whether this practice is already being used elsewhere in the code. If you feel a setting is necessary, I’m fine with it. However, it might be safe enough to proceed without one.

@nojaf
Copy link
Contributor Author

nojaf commented Aug 28, 2024

@amunger, I've added a new setting. If you have any suggestions for a better name or description, just let me know. All set.

@amunger
Copy link
Contributor

amunger commented Aug 30, 2024

here's a sample of the perf diff on a 400mb notebook - it reduces the time that the EH is locked up by ~24%, that big block goes from 2.93s -> 2.23s.
image

It seems like more of the work could be pushed to the worker to reduce this further, sortObjectPropertiesRecursively would be pretty straight forward, and createJupyterCellFromNotebookCell would be trickier since it's using types from vscode.

If possible, we should also follow the pattern of Worker usage from the other built-in extensions and avoid the node imports e.g. the css extension
Otherwise, we need to ensure that we only use this in desktop, since a web based EH doesn't have access to node.

I'm fine with merging in a change that doesn't do all the possible optimizations to get this going, but we at least need to take care of the node dependency first. I can try to look into doing that next week if you don't get around to it.

@amunger
Copy link
Contributor

amunger commented Aug 30, 2024

also, make sure to get the latest since we just had a big change go in to switch over to ESM imports

@nojaf
Copy link
Contributor Author

nojaf commented Aug 30, 2024

Thanks a bunch for this feedback. Will circle back to this on Monday. Enjoy the weekend, cheers!

@nojaf nojaf force-pushed the notebook-serialization-worker branch from 438a79e to eca2e05 Compare September 2, 2024 09:58
@nojaf
Copy link
Contributor Author

nojaf commented Sep 2, 2024

Hi @amunger

If possible, we should also follow the pattern of Worker usage from the other built-in extensions and avoid the node imports e.g. the css extension

I'm a little unsure how to proceed there, the client/server thing seems like overkill compared to the Worker thread. I'm checking for Node right now and using dynamic imports. Let me know if that works for you or if you would prefer the client/server approach.

It seems like more of the work could be pushed to the worker to reduce this further, sortObjectPropertiesRecursively would be pretty straight forward,

I actually tried that but got:

Failed to save 'smaller.ipynb': Cannot find module 'vscode' Require stack: - c:\Users\nojaf\Projects\vscode\extensions\ipynb\out\serializers.js - c:\Users\nojaf\Projects\vscode\extensions\ipynb\out\notebookSerializerWorker.js

the JavaScript is something like

"use strict";
/*---------------------------------------------------------------------------------------------
 *  Copyright (c) Microsoft Corporation. All rights reserved.
 *  Licensed under the MIT License. See License.txt in the project root for license information.
 *--------------------------------------------------------------------------------------------*/
Object.defineProperty(exports, "__esModule", { value: true });
const node_worker_threads_1 = require("node:worker_threads");
const serializers_1 = require("./serializers");
if (node_worker_threads_1.parentPort) {
    const { notebookContent, indentAmount } = node_worker_threads_1.workerData;
    const json = JSON.stringify((0, serializers_1.sortObjectPropertiesRecursively)(notebookContent), undefined, indentAmount) + '\n';
    node_worker_threads_1.parentPort.postMessage(json);
}
//# sourceMappingURL=notebookSerializerWorker.js.map

and I think the require("./serializers") loads the entire file which somewhere contains vscode which is not known in the worker perhaps?

@amunger
Copy link
Contributor

amunger commented Sep 3, 2024

Failed to save 'smaller.ipynb': Cannot find module 'vscode' Require stack

Yeah that's what I would expect - the imports get pretty deep into the vscode API for that function, so we can just leave that out for now.

Limiting to node is fine for now, we'll just have to see if the CI tests have any issues with importing like that.

there are some conflicting files to resolve at this point though.

@nojaf nojaf force-pushed the notebook-serialization-worker branch from 56605f7 to 12abccc Compare September 4, 2024 07:33
@nojaf nojaf marked this pull request as ready for review September 4, 2024 07:33
@nojaf
Copy link
Contributor Author

nojaf commented Sep 4, 2024

Alright, I believe this is ready for review @amunger.

@amunger amunger requested a review from rebornix September 4, 2024 17:17
amunger
amunger previously approved these changes Sep 4, 2024
@vs-code-engineering vs-code-engineering bot added this to the September 2024 milestone Sep 4, 2024
Copy link
Member

@rebornix rebornix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great, thank you for your help!

I left a few comments and they might need to be resolved before we merge the PR otherwise it might function as expected.


module.exports = withDefaults({
context: __dirname,
entry: {
extension: './src/ipynbMain.ts',
ipynbMain: './src/ipynbMain.ts',
Copy link
Contributor

@DonJayamanne DonJayamanne Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this , will wait for builds to go through and do some testing .

Had to change this, as we need to build the worker js file for dist (previous approach woudln't have worked, as we were importing some files from there as well)
I.e. we need to build a separate output

.map(cell => createJupyterCellFromNotebookCell(cell, preferredCellLanguage))
.map(pruneCell);
if (this.experimentalSave) {
return this.serializeViaWorker2(data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nojaf Thanks for the great work,
I've made one minor change (looks like a lot though),
basically I'm sending the entire vscode.NotebookData to the worker, and processing everything there.
That object can be safely serialized and sent to the worker

This way, all of the cup intensive work is done in worker, including constructing the JSON, serialization to bytes and the sorting of object keys.

& the other change, it to use a single worker as opposed to starting one each time,
this saves a lot more time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DonJayamanne!

Thank you so much for refining this PR! It’s such a significant improvement over my original code, and I genuinely appreciate the effort you put into it—this would have taken me quite a while to figure out on my own.

Everything is still functioning perfectly on my end, and I'm excited about moving forward with this!

Thanks again! If there’s anything else I can do to help, just give me a shout!

"main": "./out/ipynbMain.js",
"browser": "./dist/browser/ipynbMain.js",
"main": "./out/ipynbMain.node.js",
"browser": "./dist/browser/ipynbMain.browser.js",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way we have a full proof way to determine whether the runtime context is browser or node.js
See ipynbMain.ts file. (passing extra argument isBrowser)


module.exports = withDefaults({
context: __dirname,
entry: {
extension: './src/ipynbMain.ts',
['ipynbMain.node']: './src/ipynbMain.node.ts',
notebookSerializerWorker: './src/notebookSerializerWorker.ts',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this notebookSerializerWorker entry as we need to compile a new output, just copying this will not work (as this file as additional imports).


function getOutputDir(context: vscode.ExtensionContext): string {
const main = context.extension.packageJSON.main as string;
return main.indexOf('/dist/') !== -1 ? 'dist' : 'out';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same technique used in other built in extensions.

@DonJayamanne DonJayamanne requested a review from amunger September 9, 2024 14:36
@DonJayamanne
Copy link
Contributor

@amunger @rebornix , please can you review this.

@nojaf
Copy link
Contributor Author

nojaf commented Sep 11, 2024

Hello @rebornix, would you mind reviewing this PR as well?
Is this good to go? Let me know if this needs any more tweaks, happy to address these.

@rebornix
Copy link
Member

It looks good to me. Before we go ahead and merge, @DonJayamanne did you get a chance to run a build on CI so we can verify if it works as expected?

@DonJayamanne
Copy link
Contributor

DonJayamanne commented Sep 11, 2024

did you get a chance to run a build on CI so we can verify if it works as expected?

Yes & it works as expected

@DonJayamanne DonJayamanne enabled auto-merge (squash) September 12, 2024 00:27
@DonJayamanne DonJayamanne enabled auto-merge (squash) September 12, 2024 00:27
@rebornix rebornix disabled auto-merge September 12, 2024 00:35
@rebornix rebornix enabled auto-merge September 12, 2024 00:36
@rebornix rebornix merged commit 94cd4ed into microsoft:main Sep 12, 2024
7 checks passed
@vs-code-engineering vs-code-engineering bot locked and limited conversation to collaborators Oct 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants