perf(base64): Replace random access with iteration #1982

gibson042 · 2024-01-18T03:22:59Z

Description

We recently discovered that in XS, random access to string characters is O(n). This PR updates the JS base64 implementation to instead use iterators. It improves the performance in XS as expected, but irrelevantly so because XS provides a native implementation which is much faster still. There's a slight penalty in Node.js, which may or may not be tolerable (but note that apples-to-apples comparison across implementations is necessary because the JS input is substantially shorter). I'm comfortable with rejection of this PR (or at least everything after the first commit), but felt that opening it was important for transparency.

Security Considerations

n/a

Scaling Considerations

As noted above. Here are the measured effects:

 Node.js
-encodes 10775.74909090909 characters per millisecond
+encodes 9801.535832414553 characters per millisecond
-JS encodes 23130.176470588234 characters per millisecond
+JS encodes 18329.573806881242 characters per millisecond
-decodes 48752.36084452975 bytes per millisecond
+decodes 45848.339350180504 bytes per millisecond
-JS decodes 43007.671875 bytes per millisecond
+JS decodes 45448.94117647059 bytes per millisecond
 XS
-encodes 577272.2727272727 characters per millisecond
+encodes 578303.1160714285 characters per millisecond
-JS encodes 1335.143596377749 characters per millisecond
+JS encodes 1700.2734761120264 characters per millisecond
-decodes 396777.0553505535 bytes per millisecond
+decodes 393870.26373626373 bytes per millisecond
-JS decodes 1093.7360890302066 bytes per millisecond
+JS decodes 1946.3762376237623 bytes per millisecond

Documentation Considerations

n/a

Testing Considerations

n/a

Upgrade Considerations

n/a

mhofman

Something seems fishy with the bench results. From what I gather, under node.js, the encodes and JS encodes should be using the same implementation, yet they have drastically different results.

 Node.js
-encodes 10775.74909090909 characters per millisecond
+encodes 9801.535832414553 characters per millisecond
-JS encodes 23130.176470588234 characters per millisecond
+JS encodes 18329.573806881242 characters per millisecond

packages/base64/src/decode.js

mhofman · 2024-01-18T15:16:12Z

packages/base64/src/decode.js

-    i += 1;
-    quantum += 6;
-  }
-
  if (i < string.length) {


Cache string.length at the beginning. But then in which case can there ever be garbage at the end?

~~When a non-alphabet character is encountered, or~~ anything after expected padding.

Don't we throw if an invalid character is encountered?

One case where the number of characters iterated would differ from the string length is if any surrogate pairs are encountered, but again these would be invalid characters, and we'd have thrown already.

I maintain that this condition can never be true.

Don't we throw if an invalid character is encountered?

We throw when encountering a non-alphabet character, but this catches anything after expected padding:

$ node --input-type=module -e ' import "@endo/init"; import { jsDecodeBase64 } from "./src/decode.js"; console.log("valid", jsDecodeBase64("QQ==")); console.log("invalid"); jsDecodeBase64("QQ==="); ' valid Uint8Array(1) [ 65 ] invalid (Error#1) Error#1: Base64 string has trailing garbage = in string <unknown> at jsDecodeBase64 (packages/base64/src/decode.js:62:11) at packages/base64/[eval1]:6:3

Oh right I missed the new break logic from the loop. I believe there's actually a bug allowing input like QQ=*

Oh and it doesn't seem to check the register is empty once padding is reached

It looks like according to the RFC, a decoder "MAY" reject if the padding bits are not 0, but doesn't have to. I still believe this implementation will accept QQ=* for example, which is not compliant.

I believe there's actually a bug allowing input like QQ=*

Nice catch! Opened #1991 to add coverage and rebased this PR on top of its test changes.

packages/base64/src/decode.js

packages/base64/test/bench-main.js

kriskowal · 2024-01-18T20:30:01Z

Thank you for digging into this and validating the hypothesis that string[i] was O(i) on XS. I’m inclined to take the performance win on Node.js with the existing code and the performance win on XS with the native implementation, archiving this PR as evidence for future reference.

gibson042 · 2024-01-18T21:25:22Z

Something seems fishy with the bench results. From what I gather, under node.js, the encodes and JS encodes should be using the same implementation, yet they have drastically different results.

Updated the benchmark script to clarify with better output:

$ node test/bench-main.js
[pass 1] encodes 8378.88124410933 characters per millisecond
[pass 2] encodes 8523.48322147651 characters per millisecond
[pass 1] JS short-string encodes 18514.513452914798 characters per millisecond
[pass 2] JS short-string encodes 18514.513452914798 characters per millisecond
[pass 1] decodes 45357.107142857145 bytes per millisecond
[pass 2] decodes 51212.96780487805 bytes per millisecond
[pass 1] JS short-string decodes 43996.02797202797 bytes per millisecond
[pass 2] JS short-string decodes 43261.155206286836 bytes per millisecond

I’m inclined to take the performance win on Node.js with the existing code and the performance win on XS with the native implementation, archiving this PR as evidence for future reference.

Makes sense. Should I open a new PR for the benchmarking script improvements?

kriskowal · 2024-01-18T21:54:38Z

Makes sense. Should I open a new PR for the benchmarking script improvements?

That would be excellent.

mhofman · 2024-01-19T01:52:20Z

packages/base64/test/bench-main.js

+const now = (() => {
+  try {
+    // eslint-disable-next-line no-undef,global-require
+    return require('perf_hooks').performance.now || Date.now;


Node does not define require in ESM code. I believe the only way is to try to dynamically import perf_hooks which will force an async prelude. Not optimal.

Eh, I just went ahead with moving to async.

mhofman · 2024-01-19T02:00:14Z

packages/base64/src/decode.js

-    i += 1;
-    quantum += 6;
-  }
-
  if (i < string.length) {


Don't we throw if an invalid character is encountered?

One case where the number of characters iterated would differ from the string length is if any surrogate pairs are encountered, but again these would be invalid characters, and we'd have thrown already.

I maintain that this condition can never be true.

mhofman · 2024-01-19T02:02:48Z

[pass 1] encodes 8378.88124410933 characters per millisecond
[pass 2] encodes 8523.48322147651 characters per millisecond

I don't understand how the new changes you made dropped the performance in node even more.

gibson042 · 2024-01-19T08:32:57Z

Should I open a new PR for the benchmarking script improvements?

#1986

erights · 2024-01-20T00:12:04Z

I am confused about the separation between this PR and #1986 , Both contain all three files. I would have expected one (that one) to define the benchmarking framework, and the other (this one) to be staged on it and use it.

gibson042 · 2024-01-22T19:53:06Z

I am confused about the separation between this PR and #1986 , Both contain all three files. I would have expected one (that one) to define the benchmarking framework, and the other (this one) to be staged on it and use it.

@erights Good point; this is now the case.

XS is O(n) for the former (not that it really matters here because they also supply a native base64 implementation). ```console $ echo "Node.js" && node test/bench-main.js && echo "XS" && ~/.esvu/bin/xs -m test/bench-main.js Node.js encodes 9801.535832414553 characters per millisecond JS encodes 18329.573806881242 characters per millisecond decodes 45848.339350180504 bytes per millisecond JS decodes 45448.94117647059 bytes per millisecond XS encodes 578303.1160714285 characters per millisecond JS encodes 1700.2734761120264 characters per millisecond decodes 393870.26373626373 bytes per millisecond JS decodes 1946.3762376237623 bytes per millisecond ```

Ref #1982 Speeds up by 21% decoding a representation of the following CopyRecord: ``` { primitives: [undefined, null, true, false, 0, 1, 0n, 1n, "foo", Symbol.for("bar")], arrays: [[], [{}]], records: { empty: {}, singular: { array: [] } }, } ``` (and speeds up decoding a CopyArray containing 10 copies thereof by about 100%)

Ref #1982 Speeds up XS by 21% decoding a representation of the following CopyRecord: ``` { primitives: [undefined, null, true, false, 0, 1, 0n, 1n, "foo", Symbol.for("bar")], arrays: [[], [{}]], records: { empty: {}, singular: { array: [] } }, } ``` (and speeds up decoding a CopyArray containing 10 copies thereof by about 100%)

erights · 2024-01-23T23:56:08Z

Now that #1986 is merged, if it is ok with the other reviewers, I'd prefer that this PR be rebased on master if it is not already based on a state that includes #1986

Reviewers, is that ok with you?

Ref #1982 Speeds up XS by 21% decoding a representation of the following CopyRecord: ``` { primitives: [undefined, null, true, false, 0, 1, 0n, 1n, "foo", Symbol.for("bar")], arrays: [[], [{}]], records: { empty: {}, singular: { array: [] } }, } ``` (and speeds up decoding a CopyArray containing 10 copies thereof by about 100%)

gibson042 · 2024-01-24T00:15:51Z

#1986 is already incorporated, but it would be a good idea to land #1991 and then rebase this PR (which already shares its packages/base64/test/test-main.js) on top of the resulting master.

erights · 2024-01-24T00:23:56Z

#1986 is already incorporated, but it would be a good idea to land #1991 and then rebase this PR (which already shares its packages/base64/test/test-main.js) on top of the resulting master.

Sounds good to me

erights · 2024-01-24T00:37:41Z

Oops, sorry! I had no idea. Thanks for fixing.

kriskowal · 2024-01-24T01:00:30Z

@gibson042 Are you shooting to improve performance on both Node.js and XS (and assuming this will still fall thru to the native version on XS)?

gibson042 · 2024-01-24T16:47:12Z

@kriskowal Since the two are at odds and the JS implementation isn't even used by XS, I'm comfortable with getting this PR clean and then closing it without merge as you suggested in #1982 (comment)

Thank you for digging into this and validating the hypothesis that string[i] was O(i) on XS. I’m inclined to take the performance win on Node.js with the existing code and the performance win on XS with the native implementation, archiving this PR as evidence for future reference.

Most notably, prefer `charAt(0) === ch` over `startsWith(ch)`. Ref #1982 Ref #2001

gibson042 requested review from kriskowal, mhofman and erights January 18, 2024 03:22

gibson042 changed the title ~~Gibson 2024 01 base64 perf~~ perf(base64): Replace random access with iteration Jan 18, 2024

mhofman reviewed Jan 18, 2024

View reviewed changes

gibson042 mentioned this pull request Jan 18, 2024

Non-iterator string operations are slow in XS #1984

Open

mhofman reviewed Jan 19, 2024

View reviewed changes

gibson042 force-pushed the gibson-2024-01-base64-perf branch from a75208e to 0165aea Compare January 19, 2024 08:23

gibson042 force-pushed the gibson-2024-01-base64-perf branch from 0165aea to 1beb766 Compare January 22, 2024 19:50

gibson042 force-pushed the gibson-2024-01-base64-perf branch 2 times, most recently from 98a3482 to 2274382 Compare January 22, 2024 20:11

gibson042 added 3 commits January 22, 2024 15:16

test(base64): Add coverage for invalid input

f9c30d8

perf(base64): Minimize all use of O(n) string operations

881d14c

gibson042 force-pushed the gibson-2024-01-base64-perf branch from 2274382 to 881d14c Compare January 22, 2024 20:18

gibson042 requested a review from mhofman January 23, 2024 19:05

gibson042 mentioned this pull request Jan 23, 2024

perf(marshal): Replace XS-expensive string operations #2001

Merged

erights closed this Jan 24, 2024

mhofman reopened this Jan 24, 2024

gibson042 added a commit that referenced this pull request Jan 25, 2024

perf(marshal): Replace more XS-expensive string operations

82e4770

Most notably, prefer `charAt(0) === ch` over `startsWith(ch)`. Ref #1982 Ref #2001

gibson042 mentioned this pull request Jan 25, 2024

perf(marshal): Replace more XS-expensive string operations #2005

Merged

kriskowal removed their request for review February 5, 2024 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(base64): Replace random access with iteration #1982

perf(base64): Replace random access with iteration #1982

gibson042 commented Jan 18, 2024

mhofman left a comment

mhofman Jan 18, 2024

gibson042 Jan 18, 2024 •

edited

Loading

mhofman Jan 19, 2024

gibson042 Jan 19, 2024 •

edited

Loading

mhofman Jan 19, 2024

mhofman Jan 19, 2024

mhofman Jan 22, 2024

gibson042 Jan 22, 2024

kriskowal commented Jan 18, 2024 •

edited

Loading

gibson042 commented Jan 18, 2024

kriskowal commented Jan 18, 2024

mhofman Jan 19, 2024

gibson042 Jan 19, 2024

mhofman Jan 19, 2024

mhofman commented Jan 19, 2024

gibson042 commented Jan 19, 2024

erights commented Jan 20, 2024

gibson042 commented Jan 22, 2024

erights commented Jan 23, 2024

gibson042 commented Jan 24, 2024

erights commented Jan 24, 2024

erights commented Jan 24, 2024

kriskowal commented Jan 24, 2024

gibson042 commented Jan 24, 2024

perf(base64): Replace random access with iteration #1982

Are you sure you want to change the base?

perf(base64): Replace random access with iteration #1982

Conversation

gibson042 commented Jan 18, 2024

Description

Security Considerations

Scaling Considerations

Documentation Considerations

Testing Considerations

Upgrade Considerations

mhofman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gibson042 Jan 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gibson042 Jan 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kriskowal commented Jan 18, 2024 • edited Loading

gibson042 commented Jan 18, 2024

kriskowal commented Jan 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhofman commented Jan 19, 2024

gibson042 commented Jan 19, 2024

erights commented Jan 20, 2024

gibson042 commented Jan 22, 2024

erights commented Jan 23, 2024

gibson042 commented Jan 24, 2024

erights commented Jan 24, 2024

erights commented Jan 24, 2024

kriskowal commented Jan 24, 2024

gibson042 commented Jan 24, 2024

gibson042 Jan 18, 2024 •

edited

Loading

gibson042 Jan 19, 2024 •

edited

Loading

kriskowal commented Jan 18, 2024 •

edited

Loading