Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(base64): Replace random access with iteration #1982

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

gibson042
Copy link
Contributor

Description

We recently discovered that in XS, random access to string characters is O(n). This PR updates the JS base64 implementation to instead use iterators. It improves the performance in XS as expected, but irrelevantly so because XS provides a native implementation which is much faster still. There's a slight penalty in Node.js, which may or may not be tolerable (but note that apples-to-apples comparison across implementations is necessary because the JS input is substantially shorter). I'm comfortable with rejection of this PR (or at least everything after the first commit), but felt that opening it was important for transparency.

Security Considerations

n/a

Scaling Considerations

As noted above. Here are the measured effects:

 Node.js
-encodes 10775.74909090909 characters per millisecond
+encodes 9801.535832414553 characters per millisecond
-JS encodes 23130.176470588234 characters per millisecond
+JS encodes 18329.573806881242 characters per millisecond
-decodes 48752.36084452975 bytes per millisecond
+decodes 45848.339350180504 bytes per millisecond
-JS decodes 43007.671875 bytes per millisecond
+JS decodes 45448.94117647059 bytes per millisecond
 XS
-encodes 577272.2727272727 characters per millisecond
+encodes 578303.1160714285 characters per millisecond
-JS encodes 1335.143596377749 characters per millisecond
+JS encodes 1700.2734761120264 characters per millisecond
-decodes 396777.0553505535 bytes per millisecond
+decodes 393870.26373626373 bytes per millisecond
-JS decodes 1093.7360890302066 bytes per millisecond
+JS decodes 1946.3762376237623 bytes per millisecond

Documentation Considerations

n/a

Testing Considerations

n/a

Upgrade Considerations

n/a

@gibson042 gibson042 changed the title Gibson 2024 01 base64 perf perf(base64): Replace random access with iteration Jan 18, 2024
Copy link
Contributor

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems fishy with the bench results. From what I gather, under node.js, the encodes and JS encodes should be using the same implementation, yet they have drastically different results.

 Node.js
-encodes 10775.74909090909 characters per millisecond
+encodes 9801.535832414553 characters per millisecond
-JS encodes 23130.176470588234 characters per millisecond
+JS encodes 18329.573806881242 characters per millisecond

packages/base64/src/decode.js Outdated Show resolved Hide resolved
i += 1;
quantum += 6;
}

if (i < string.length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache string.length at the beginning. But then in which case can there ever be garbage at the end?

Copy link
Contributor Author

@gibson042 gibson042 Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a non-alphabet character is encountered, or anything after expected padding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we throw if an invalid character is encountered?

One case where the number of characters iterated would differ from the string length is if any surrogate pairs are encountered, but again these would be invalid characters, and we'd have thrown already.

I maintain that this condition can never be true.

Copy link
Contributor Author

@gibson042 gibson042 Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we throw if an invalid character is encountered?

We throw when encountering a non-alphabet character, but this catches anything after expected padding:

$ node --input-type=module -e '
  import "@endo/init";
  import { jsDecodeBase64 } from "./src/decode.js";
  console.log("valid", jsDecodeBase64("QQ=="));
  console.log("invalid");
  jsDecodeBase64("QQ===");
'
valid Uint8Array(1) [ 65 ]
invalid
(Error#1)
Error#1: Base64 string has trailing garbage = in string <unknown>

  at jsDecodeBase64 (packages/base64/src/decode.js:62:11)
  at packages/base64/[eval1]:6:3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right I missed the new break logic from the loop. I believe there's actually a bug allowing input like QQ=*

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and it doesn't seem to check the register is empty once padding is reached

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like according to the RFC, a decoder "MAY" reject if the padding bits are not 0, but doesn't have to. I still believe this implementation will accept QQ=* for example, which is not compliant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there's actually a bug allowing input like QQ=*

Nice catch! Opened #1991 to add coverage and rebased this PR on top of its test changes.

packages/base64/src/decode.js Outdated Show resolved Hide resolved
packages/base64/test/bench-main.js Outdated Show resolved Hide resolved
@kriskowal
Copy link
Member

kriskowal commented Jan 18, 2024

Thank you for digging into this and validating the hypothesis that string[i] was O(i) on XS. I’m inclined to take the performance win on Node.js with the existing code and the performance win on XS with the native implementation, archiving this PR as evidence for future reference.

@gibson042
Copy link
Contributor Author

Something seems fishy with the bench results. From what I gather, under node.js, the encodes and JS encodes should be using the same implementation, yet they have drastically different results.

Updated the benchmark script to clarify with better output:

$ node test/bench-main.js
[pass 1] encodes 8378.88124410933 characters per millisecond
[pass 2] encodes 8523.48322147651 characters per millisecond
[pass 1] JS short-string encodes 18514.513452914798 characters per millisecond
[pass 2] JS short-string encodes 18514.513452914798 characters per millisecond
[pass 1] decodes 45357.107142857145 bytes per millisecond
[pass 2] decodes 51212.96780487805 bytes per millisecond
[pass 1] JS short-string decodes 43996.02797202797 bytes per millisecond
[pass 2] JS short-string decodes 43261.155206286836 bytes per millisecond

I’m inclined to take the performance win on Node.js with the existing code and the performance win on XS with the native implementation, archiving this PR as evidence for future reference.

Makes sense. Should I open a new PR for the benchmarking script improvements?

@kriskowal
Copy link
Member

Makes sense. Should I open a new PR for the benchmarking script improvements?

That would be excellent.

const now = (() => {
try {
// eslint-disable-next-line no-undef,global-require
return require('perf_hooks').performance.now || Date.now;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node does not define require in ESM code. I believe the only way is to try to dynamically import perf_hooks which will force an async prelude. Not optimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I just went ahead with moving to async.

i += 1;
quantum += 6;
}

if (i < string.length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we throw if an invalid character is encountered?

One case where the number of characters iterated would differ from the string length is if any surrogate pairs are encountered, but again these would be invalid characters, and we'd have thrown already.

I maintain that this condition can never be true.

@mhofman
Copy link
Contributor

mhofman commented Jan 19, 2024

[pass 1] encodes 8378.88124410933 characters per millisecond
[pass 2] encodes 8523.48322147651 characters per millisecond

I don't understand how the new changes you made dropped the performance in node even more.

@gibson042 gibson042 force-pushed the gibson-2024-01-base64-perf branch from a75208e to 0165aea Compare January 19, 2024 08:23
@gibson042
Copy link
Contributor Author

Should I open a new PR for the benchmarking script improvements?

#1986

@erights
Copy link
Contributor

erights commented Jan 20, 2024

I am confused about the separation between this PR and #1986 , Both contain all three files. I would have expected one (that one) to define the benchmarking framework, and the other (this one) to be staged on it and use it.

@gibson042 gibson042 force-pushed the gibson-2024-01-base64-perf branch from 0165aea to 1beb766 Compare January 22, 2024 19:50
@gibson042
Copy link
Contributor Author

I am confused about the separation between this PR and #1986 , Both contain all three files. I would have expected one (that one) to define the benchmarking framework, and the other (this one) to be staged on it and use it.

@erights Good point; this is now the case.

@gibson042 gibson042 force-pushed the gibson-2024-01-base64-perf branch 2 times, most recently from 98a3482 to 2274382 Compare January 22, 2024 20:11
XS is O(n) for the former (not that it really matters here because they
also supply a native base64 implementation).

```console
$ echo "Node.js" && node test/bench-main.js && echo "XS" && ~/.esvu/bin/xs -m test/bench-main.js
Node.js
encodes 9801.535832414553 characters per millisecond
JS encodes 18329.573806881242 characters per millisecond
decodes 45848.339350180504 bytes per millisecond
JS decodes 45448.94117647059 bytes per millisecond
XS
encodes 578303.1160714285 characters per millisecond
JS encodes 1700.2734761120264 characters per millisecond
decodes 393870.26373626373 bytes per millisecond
JS decodes 1946.3762376237623 bytes per millisecond
```
@gibson042 gibson042 force-pushed the gibson-2024-01-base64-perf branch from 2274382 to 881d14c Compare January 22, 2024 20:18
@gibson042 gibson042 requested a review from mhofman January 23, 2024 19:05
gibson042 added a commit that referenced this pull request Jan 23, 2024
Ref #1982

Speeds up by 21% decoding a representation of the following CopyRecord:
```
{
  primitives: [undefined, null, true, false, 0, 1, 0n, 1n, "foo", Symbol.for("bar")],
  arrays: [[], [{}]],
  records: { empty: {}, singular: { array: [] } },
}
```
(and speeds up decoding a CopyArray containing 10 copies thereof by about 100%)
gibson042 added a commit that referenced this pull request Jan 23, 2024
Ref #1982

Speeds up XS by 21% decoding a representation of the following CopyRecord:
```
{
  primitives: [undefined, null, true, false, 0, 1, 0n, 1n, "foo", Symbol.for("bar")],
  arrays: [[], [{}]],
  records: { empty: {}, singular: { array: [] } },
}
```
(and speeds up decoding a CopyArray containing 10 copies thereof by about 100%)
gibson042 added a commit that referenced this pull request Jan 23, 2024
Ref #1982

Speeds up XS by 21% decoding a representation of the following CopyRecord:
```
{
  primitives: [undefined, null, true, false, 0, 1, 0n, 1n, "foo", Symbol.for("bar")],
  arrays: [[], [{}]],
  records: { empty: {}, singular: { array: [] } },
}
```
(and speeds up decoding a CopyArray containing 10 copies thereof by about 100%)
gibson042 added a commit that referenced this pull request Jan 23, 2024
Ref #1982

Speeds up XS by 21% decoding a representation of the following CopyRecord:
```
{
  primitives: [undefined, null, true, false, 0, 1, 0n, 1n, "foo", Symbol.for("bar")],
  arrays: [[], [{}]],
  records: { empty: {}, singular: { array: [] } },
}
```
(and speeds up decoding a CopyArray containing 10 copies thereof by about 100%)
@erights
Copy link
Contributor

erights commented Jan 23, 2024

Now that #1986 is merged, if it is ok with the other reviewers, I'd prefer that this PR be rebased on master if it is not already based on a state that includes #1986

Reviewers, is that ok with you?

gibson042 added a commit that referenced this pull request Jan 23, 2024
Ref #1982

Speeds up XS by 21% decoding a representation of the following CopyRecord:
```
{
  primitives: [undefined, null, true, false, 0, 1, 0n, 1n, "foo", Symbol.for("bar")],
  arrays: [[], [{}]],
  records: { empty: {}, singular: { array: [] } },
}
```
(and speeds up decoding a CopyArray containing 10 copies thereof by about 100%)
@gibson042
Copy link
Contributor Author

#1986 is already incorporated, but it would be a good idea to land #1991 and then rebase this PR (which already shares its packages/base64/test/test-main.js) on top of the resulting master.

@erights
Copy link
Contributor

erights commented Jan 24, 2024

#1986 is already incorporated, but it would be a good idea to land #1991 and then rebase this PR (which already shares its packages/base64/test/test-main.js) on top of the resulting master.

Sounds good to me

@erights erights closed this Jan 24, 2024
@mhofman mhofman reopened this Jan 24, 2024
@erights
Copy link
Contributor

erights commented Jan 24, 2024

image

Oops, sorry! I had no idea. Thanks for fixing.

@kriskowal
Copy link
Member

@gibson042 Are you shooting to improve performance on both Node.js and XS (and assuming this will still fall thru to the native version on XS)?

@gibson042
Copy link
Contributor Author

@kriskowal Since the two are at odds and the JS implementation isn't even used by XS, I'm comfortable with getting this PR clean and then closing it without merge as you suggested in #1982 (comment)

Thank you for digging into this and validating the hypothesis that string[i] was O(i) on XS. I’m inclined to take the performance win on Node.js with the existing code and the performance win on XS with the native implementation, archiving this PR as evidence for future reference.

gibson042 added a commit that referenced this pull request Jan 25, 2024
Most notably, prefer `charAt(0) === ch` over `startsWith(ch)`.

Ref #1982
Ref #2001
@kriskowal kriskowal removed their request for review February 5, 2024 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants