fix(NODE-3451): fix performance regression from v1 #451
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
NODE-3451 documents a performance regression in the node driver v4, which is actually due to a performance regression in js-bson v4 deserialization method (compared to v1).
The notable culprits were:
What changed?
deserialize
method has been updated to checkinstanceof Buffer
and skip the rewrapping in those instances; this is a temporary measure that only addresses performance for Node.js buffersdeserializeStream
was left untouched for scope reasonsdeserializeObject
method was updated to check for the presence of potential DBRef keys as it goes, removing the negative performance impact for any objects that do not contain any DBRef keys; there is some further optimization that could be done to eliminate theisDBRefLike
check altogether, but since we expect these to be pretty rare, it didn't seem worth optimizing that specific edge casevalidateUtf8
method was updated to only run if the\uFFFD
character is present: technically, this makes the performance worse for strings that do contain that special character, however, for all other strings, the loop over the resulting string withcharCodeAt
is faster; unfortunately there is not much else that can be done to optimize string deserialization without losing the validation (short of doing our own decoding)validateUtf8
call in DBPOINTER type was left untouched for scope reasonsAfter these changes, there may still be a residual 5% performance degradation for the typical use case relative to v1 which can be attributed to the remaining buffer and string validation.