Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greatly speed up 'advanced' ipc receiving with big messages #42931

Closed

Conversation

CaramelFur
Copy link
Contributor

Problem

I was running into an issue where I needed to transfer large buffers between a child_process and the main process with 'advanced' serializing, and this was suprisingly very slow.

Solution

After some debugging and profiling I found that the Buffer functions, and mainly the Buffer.concat function were to blame. So I have changed the code to use these functions as little as possible.

Now it will not concat each and every incoming part, it will instead store these in an array and only concat them when the full message has been collected.

Tests

I wrote a small test where I transferred different buffer sizes from a child process to the parent, and timed how long it took before the message was received in the parent.

(The scripts I used to test this: test-ipc.zip)

Before fix:

1kb   :  1 ms
8kb   :  0 ms
64kb  :  0 ms
512kb :  1 ms
1mb   :  6 ms
2mb   :  17 ms
4mb   :  43 ms
8mb   :  176 ms
16mb  :  717 ms
32mb  :  2726 ms
64mb  :  11781 ms
128mb :  57075 ms
1gb   :  3299888 ms (55 minutes)

After fix:

1kb   :  1 ms
8kb   :  0 ms
64kb  :  0 ms
512kb :  1 ms
1mb   :  2 ms
2mb   :  3 ms
4mb   :  6 ms
8mb   :  15 ms
16mb  :  31 ms
32mb  :  73 ms
64mb  :  142 ms
128mb :  217 ms
1gb   :  1426 ms

@nodejs-github-bot nodejs-github-bot added child_process Issues and PRs related to the child_process subsystem. needs-ci PRs that need a full CI run. labels Apr 30, 2022
Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! Just left a suggestion.

Comment on lines 69 to 75
// We read the uint manually here, because this is faster than first converting
// it to a buffer and using `readUInt32BE` on that.
const size =
messageBufferHead[0] << 24 |
messageBufferHead[1] << 16 |
messageBufferHead[2] << 8 |
messageBufferHead[3];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We read the uint manually here, because this is faster than first converting
// it to a buffer and using `readUInt32BE` on that.
const size =
messageBufferHead[0] << 24 |
messageBufferHead[1] << 16 |
messageBufferHead[2] << 8 |
messageBufferHead[3];
const size = Buffer.prototype.readUInt32BE.call(messageBufferHead, 0) + 4;

This might work as well? I added the plus 4 as well. That way there's no need to add it later on. This might require a new variable name as well.

Copy link
Contributor Author

@CaramelFur CaramelFur May 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added these suggestions, since I can't see any noticable performance impact of this after some testing, and it makes the code easier to read.

@lpinca lpinca added the request-ci Add this label to start a Jenkins CI on a PR. label May 1, 2022
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label May 1, 2022
@nodejs-github-bot
Copy link
Collaborator

@CaramelFur
Copy link
Contributor Author

CaramelFur commented May 1, 2022

@lpinca Is it normal for the osx tests to fail, or is that because of this change?
Sorry, I'm not familiar with the node CI system.

while (messageBufferHead.length >= 4) {
// We call `readUInt32BE` manually here, because this is faster than first converting
// it to a buffer and using `readUInt32BE` on that.
const fullMessageSize = Buffer.prototype.readUInt32BE.call(messageBufferHead, 0) + 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const { readUInt32BE } = require('internal/buffer');
// ...
const fullMessageSize = ReflectApply(readUInt32BE, messageBufferHead, [0]) + 4;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change has been added

@himself65
Copy link
Member

@lpinca Is it normal for the osx tests to fail, or is that because of this change? Sorry, I'm not familiar with the node CI system.

It's normal for some randomly crash, and I don't know why

Copy link
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments regarding the usage of primordials which has a documentation on https://github.com/nodejs/node/blob/master/doc/contributing/primordials.md

if (messageBuffer.length < 4 + size) {
break;
}
channel[kMessageBuffer].push(readData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage of primordials should be preferred for any new code. Full documentation can be found from https://github.com/nodejs/node/blob/master/doc/contributing/primordials.md

ArrayPrototypePush(channel[kMessageBuffer], readData);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayPrototypePush is mentioned in the list of primordials with known performance issues- https://github.com/nodejs/node/blob/master/doc/contributing/primordials.md#primordials-with-known-performance-issues, so I'm not sure if it's worth using it here given that this PR is about speeding things up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's also a little ambiguous what "new code" means here - new APIs (this one is not a new API) or just new lines of code that are not moved in from a different place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some testing it seems that the primordial function is about 12% slower than the normal one. However, this does seem to affect the speed of the ipc in any noticible way, probably because most processing time is spent elsewhere. So I don't see any problem with using the primordial function.

lib/internal/child_process/serialization.js Show resolved Hide resolved
Copy link
Member

@himself65 himself65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CaramelFur
Copy link
Contributor Author

@himself65 Is there anything else I need to do to get this merged? Or do I just have to wait until a maintainer merges this?

@CaramelFur
Copy link
Contributor Author

@lpinca Hi there, sorry to bother, but this PR has now been a month without any activity. Is this normal, or do I have to do something else to get this merged?

@RaisinTen
Copy link
Contributor

This would need a fully green Jenkins CI run before landing.

@RaisinTen RaisinTen added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jun 6, 2022
@nodejs-github-bot
Copy link
Collaborator

@RaisinTen
Copy link
Contributor

cc @nodejs/child_process

@nodejs-github-bot
Copy link
Collaborator

while (messageBufferHead.length >= 4) {
// We call `readUInt32BE` manually here, because this is faster than first converting
// it to a buffer and using `readUInt32BE` on that.
const fullMessageSize = ReflectApply(readUInt32BE, messageBufferHead, [0]) + 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by suggestion: read the int inline, that's both faster and shorter (and arguably easier to read)

Suggested change
const fullMessageSize = ReflectApply(readUInt32BE, messageBufferHead, [0]) + 4;
const b = messageBufferHead;
const fullMessageSize = b[0] * 0x1000000 + b[1] * 65536 + b[2] * 256 + b[3];

@CaramelFur CaramelFur force-pushed the feature/faster-advanced-ipc branch from 7e80934 to e2247c8 Compare June 15, 2022 13:58
@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@CaramelFur
Copy link
Contributor Author

Ah great, all checks have now finally passed.

lpinca pushed a commit that referenced this pull request Jun 19, 2022
PR-URL: #42931
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
lpinca pushed a commit that referenced this pull request Jun 19, 2022
PR-URL: #42931
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
@lpinca
Copy link
Member

lpinca commented Jun 19, 2022

Landed in 94020ac...8db79cc.

@lpinca lpinca closed this Jun 19, 2022
targos pushed a commit that referenced this pull request Jul 12, 2022
PR-URL: #42931
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
targos pushed a commit that referenced this pull request Jul 12, 2022
PR-URL: #42931
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
targos pushed a commit that referenced this pull request Jul 18, 2022
PR-URL: #42931
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
targos pushed a commit that referenced this pull request Jul 18, 2022
PR-URL: #42931
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
targos pushed a commit that referenced this pull request Jul 31, 2022
PR-URL: #42931
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
targos pushed a commit that referenced this pull request Jul 31, 2022
PR-URL: #42931
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
guangwong pushed a commit to noslate-project/node that referenced this pull request Oct 10, 2022
PR-URL: nodejs/node#42931
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
guangwong pushed a commit to noslate-project/node that referenced this pull request Oct 10, 2022
PR-URL: nodejs/node#42931
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. child_process Issues and PRs related to the child_process subsystem. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants