-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Worker to serialize Notebook #226632
Use Worker to serialize Notebook #226632
Conversation
@microsoft-github-policy-service agree company=G-Research |
nice! I should be able to give this a try at the end of the week. |
@amunger, I'm not entirely sure about the risks of using a worker for this in Node or how stable it is. It likely depends on whether this practice is already being used elsewhere in the code. If you feel a setting is necessary, I’m fine with it. However, it might be safe enough to proceed without one. |
@amunger, I've added a new setting. If you have any suggestions for a better name or description, just let me know. All set. |
here's a sample of the perf diff on a 400mb notebook - it reduces the time that the EH is locked up by ~24%, that big block goes from 2.93s -> 2.23s. It seems like more of the work could be pushed to the worker to reduce this further, If possible, we should also follow the pattern of Worker usage from the other built-in extensions and avoid the node imports e.g. the css extension I'm fine with merging in a change that doesn't do all the possible optimizations to get this going, but we at least need to take care of the node dependency first. I can try to look into doing that next week if you don't get around to it. |
also, make sure to get the latest since we just had a big change go in to switch over to ESM imports |
Thanks a bunch for this feedback. Will circle back to this on Monday. Enjoy the weekend, cheers! |
438a79e
to
eca2e05
Compare
Hi @amunger
I'm a little unsure how to proceed there, the client/server thing seems like overkill compared to the Worker thread. I'm checking for Node right now and using dynamic imports. Let me know if that works for you or if you would prefer the client/server approach.
I actually tried that but got:
the JavaScript is something like
and I think the |
Yeah that's what I would expect - the imports get pretty deep into the vscode API for that function, so we can just leave that out for now. Limiting to node is fine for now, we'll just have to see if the CI tests have any issues with importing like that. there are some conflicting files to resolve at this point though. |
56605f7
to
12abccc
Compare
Alright, I believe this is ready for review @amunger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great, thank you for your help!
I left a few comments and they might need to be resolved before we merge the PR otherwise it might function as expected.
|
||
module.exports = withDefaults({ | ||
context: __dirname, | ||
entry: { | ||
extension: './src/ipynbMain.ts', | ||
ipynbMain: './src/ipynbMain.ts', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this , will wait for builds to go through and do some testing .
Had to change this, as we need to build the worker js file for dist (previous approach woudln't have worked, as we were importing some files from there as well)
I.e. we need to build a separate output
.map(cell => createJupyterCellFromNotebookCell(cell, preferredCellLanguage)) | ||
.map(pruneCell); | ||
if (this.experimentalSave) { | ||
return this.serializeViaWorker2(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nojaf Thanks for the great work,
I've made one minor change (looks like a lot though),
basically I'm sending the entire vscode.NotebookData
to the worker, and processing everything there.
That object can be safely serialized and sent to the worker
This way, all of the cup intensive work is done in worker, including constructing the JSON, serialization to bytes and the sorting of object keys.
& the other change, it to use a single worker as opposed to starting one each time,
this saves a lot more time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DonJayamanne!
Thank you so much for refining this PR! It’s such a significant improvement over my original code, and I genuinely appreciate the effort you put into it—this would have taken me quite a while to figure out on my own.
Everything is still functioning perfectly on my end, and I'm excited about moving forward with this!
Thanks again! If there’s anything else I can do to help, just give me a shout!
"main": "./out/ipynbMain.js", | ||
"browser": "./dist/browser/ipynbMain.js", | ||
"main": "./out/ipynbMain.node.js", | ||
"browser": "./dist/browser/ipynbMain.browser.js", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way we have a full proof way to determine whether the runtime context is browser or node.js
See ipynbMain.ts
file. (passing extra argument isBrowser
)
|
||
module.exports = withDefaults({ | ||
context: __dirname, | ||
entry: { | ||
extension: './src/ipynbMain.ts', | ||
['ipynbMain.node']: './src/ipynbMain.node.ts', | ||
notebookSerializerWorker: './src/notebookSerializerWorker.ts', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this notebookSerializerWorker entry as we need to compile a new output, just copying this will not work (as this file as additional imports).
|
||
function getOutputDir(context: vscode.ExtensionContext): string { | ||
const main = context.extension.packageJSON.main as string; | ||
return main.indexOf('/dist/') !== -1 ? 'dist' : 'out'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same technique used in other built in extensions.
Hello @rebornix, would you mind reviewing this PR as well? |
It looks good to me. Before we go ahead and merge, @DonJayamanne did you get a chance to run a build on CI so we can verify if it works as expected? |
Yes & it works as expected |
Related to #214040
Hi @amunger, I tried running the
JSON.stringify
call on a Worker thread. It works on my machine and seems to improve the situation, but it's hard to say if the issue is completely resolved. Any thoughts?