Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(ext/node): Use an internal DataView in Buffer #17815

Closed

Conversation

aapoalas
Copy link
Collaborator

@aapoalas aapoalas commented Feb 18, 2023

Refactor the node Buffer class polyfill to take less code lines by using an internal DataView for reading. This improves performance in some places (reading signed bigints and floating point numbers, writing bigints and floating point numbers) and keeps the same performance roughly the same in others at the cost of increased creation time.

Measurements

Creation

Creation of the internal DataView requires quite a bit of extra work, making the creation of Buffer significantly slower, especially so for empty buffers.

It turns out creating an empty Uint8Array is optimized somehow in that it doesn't truly create an ArrayBuffer for the Uint8Array. Accessing the buffer property triggers the creation which then takes more than 400 nanoseconds. This is the difference between creation of an empty buffer in main vs in this PR.

This PR:

benchmark                 time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------- -----------------------------
empty creation           543.29 ns/iter   1,840,641.8 (341.72 ns … 719.75 ns) 593.96 ns 713.97 ns 719.75 ns
8 creation               712.03 ns/iter   1,404,431.2 (516.36 ns … 937.88 ns) 770.02 ns 937.88 ns 937.88 ns
64 creation               754.8 ns/iter   1,324,858.1       (458.7 ns … 1 µs) 838.74 ns      1 µs      1 µs
1024 * 1000 creation     292.61 µs/iter       3,417.5     (7.05 µs … 3.89 ms) 399.34 µs   3.01 ms   3.41 ms

Main:

benchmark                 time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------- -----------------------------
empty creation            94.88 ns/iter  10,540,038.9  (88.85 ns … 108.85 ns)   95.6 ns  98.02 ns 107.27 ns
8 creation               504.84 ns/iter   1,980,817.6 (315.56 ns … 788.94 ns) 695.35 ns 787.98 ns 788.94 ns
64 creation              562.65 ns/iter   1,777,307.9 (326.82 ns … 848.31 ns) 771.65 ns 848.31 ns 848.31 ns
1024 * 1000 creation     293.49 µs/iter       3,407.2    (17.37 µs … 3.84 ms) 389.35 µs    2.9 ms   3.01 ms

Reading integers

Reading integers is equivalent in performance or just the tiniest bit slower, less than a nanosecond of a difference.

This PR:

benchmark                           time (avg)        iter/s             (min … max)       p75       p99      p995
------------------------------------------------------------------------------------ -----------------------------
read u8 non-zero offset              9.77 ns/iter 102,315,090.7     (9.6 ns … 28.11 ns)   9.69 ns  10.24 ns  11.23 ns
read i8 non-zero offset              10.4 ns/iter  96,111,809.3     (9.99 ns … 34.6 ns)  10.05 ns  17.74 ns  18.25 ns
read u16 LE non-zero offset         10.08 ns/iter  99,181,207.7    (9.99 ns … 30.17 ns)  10.05 ns   10.8 ns  11.32 ns
read u16 BE non-zero offset         10.32 ns/iter  96,865,482.3    (9.96 ns … 30.64 ns)  10.06 ns   16.4 ns  16.96 ns
read i16 LE non-zero offset         10.29 ns/iter  97,154,731.7      (10 ns … 31.94 ns)  10.08 ns  15.96 ns  16.49 ns
read i16 BE non-zero offset         10.24 ns/iter  97,630,253.9   (10.01 ns … 16.01 ns)  10.16 ns  15.48 ns  15.48 ns
read u24 LE non-zero offset         10.12 ns/iter  98,790,422.8    (9.77 ns … 19.03 ns)   9.95 ns  15.91 ns  16.44 ns
read u24 BE non-zero offset          9.97 ns/iter 100,327,698.2    (9.77 ns … 17.58 ns)   9.95 ns  10.56 ns   12.1 ns
read i24 LE non-zero offset          9.96 ns/iter 100,407,851.3    (9.77 ns … 16.55 ns)   9.97 ns  10.51 ns  10.68 ns
read i24 BE non-zero offset            10 ns/iter 100,011,343.1    (9.77 ns … 26.28 ns)   9.99 ns  10.64 ns  11.53 ns
read u32 LE non-zero offset          9.84 ns/iter 101,672,354.3    (9.77 ns … 12.51 ns)   9.83 ns  10.49 ns  10.87 ns
read u32 BE non-zero offset          9.85 ns/iter 101,528,044.6    (9.77 ns … 16.01 ns)   9.81 ns  10.56 ns   11.3 ns
read i32 LE non-zero offset          9.85 ns/iter 101,574,064.1     (9.77 ns … 18.6 ns)   9.81 ns  11.28 ns  11.71 ns
read i32 BE non-zero offset          9.91 ns/iter 100,916,939.4    (9.77 ns … 16.99 ns)   9.83 ns  15.48 ns  15.48 ns
read u40 LE non-zero offset         10.02 ns/iter  99,832,294.5    (9.77 ns … 16.47 ns)   9.96 ns  15.83 ns  15.87 ns
read u40 BE non-zero offset         10.26 ns/iter  97,474,628.4    (9.77 ns … 18.66 ns)   9.95 ns  15.99 ns  16.46 ns
read i40 LE non-zero offset          9.96 ns/iter 100,402,452.2    (9.77 ns … 18.59 ns)   9.95 ns  10.76 ns  11.57 ns
read i40 BE non-zero offset         10.14 ns/iter  98,659,108.9    (9.77 ns … 17.56 ns)   9.96 ns  15.96 ns  15.97 ns
read u48 LE non-zero offset          9.96 ns/iter 100,391,646.8    (9.77 ns … 16.75 ns)   9.95 ns   11.2 ns  11.87 ns
read u48 BE non-zero offset         10.04 ns/iter  99,601,062.0    (9.77 ns … 16.49 ns)   9.96 ns  15.96 ns  15.96 ns
read i48 LE non-zero offset         10.04 ns/iter  99,560,280.3    (9.77 ns … 17.77 ns)   9.96 ns  15.96 ns  15.96 ns
read i48 BE non-zero offset          9.96 ns/iter 100,426,793.7    (9.77 ns … 17.35 ns)   9.96 ns  10.57 ns  10.87 ns

Main:

benchmark                           time (avg)        iter/s             (min … max)       p75       p99      p995
------------------------------------------------------------------------------------ -----------------------------
read u8 non-zero offset              9.75 ns/iter 102,593,084.9    (9.63 ns … 28.92 ns)   9.67 ns   10.8 ns  11.57 ns
read i8 non-zero offset              9.79 ns/iter 102,148,046.3    (9.59 ns … 29.82 ns)   9.68 ns  10.71 ns  14.77 ns
read u16 LE non-zero offset          9.71 ns/iter 103,008,946.4    (9.59 ns … 29.55 ns)   9.67 ns  10.41 ns  10.92 ns
read u16 BE non-zero offset          9.25 ns/iter 108,136,507.4    (9.05 ns … 16.37 ns)   9.25 ns   10.1 ns  10.68 ns
read i16 LE non-zero offset          9.74 ns/iter 102,624,042.1    (9.06 ns … 30.55 ns)   9.68 ns  10.97 ns  11.94 ns
read i16 BE non-zero offset           9.9 ns/iter 100,962,055.0    (9.67 ns … 15.34 ns)   9.79 ns  14.29 ns  14.29 ns
read u24 LE non-zero offset         10.01 ns/iter  99,928,791.1    (9.85 ns … 13.94 ns)  10.06 ns  10.55 ns  10.62 ns
read u24 BE non-zero offset          9.84 ns/iter 101,620,145.7    (9.59 ns … 16.74 ns)   9.77 ns  16.07 ns  16.14 ns
read i24 LE non-zero offset          9.73 ns/iter 102,736,462.4    (9.59 ns … 11.91 ns)   9.77 ns   10.3 ns  10.38 ns
read i24 BE non-zero offset          9.73 ns/iter 102,737,084.6       (9.53 ns … 14 ns)   9.77 ns   10.3 ns  10.42 ns
read u32 LE non-zero offset           9.8 ns/iter 102,053,625.7    (9.06 ns … 15.11 ns)   9.24 ns  14.29 ns  14.61 ns
read u32 BE non-zero offset          9.26 ns/iter 107,977,878.2    (9.06 ns … 10.53 ns)   9.25 ns   9.74 ns   9.77 ns
read i32 LE non-zero offset          9.24 ns/iter 108,194,503.6    (9.06 ns … 10.21 ns)   9.25 ns   9.74 ns   9.77 ns
read i32 BE non-zero offset          9.31 ns/iter 107,369,182.7    (9.06 ns … 16.48 ns)   9.26 ns     11 ns  12.53 ns
read u40 LE non-zero offset           9.8 ns/iter 102,074,231.4     (9.5 ns … 20.65 ns)   9.77 ns   13.5 ns  15.91 ns
read u40 BE non-zero offset          9.83 ns/iter 101,732,018.5    (9.58 ns … 28.68 ns)   9.77 ns   12.9 ns  14.44 ns
read i40 LE non-zero offset           9.8 ns/iter 102,054,738.3    (9.53 ns … 22.68 ns)   9.77 ns  12.61 ns  13.99 ns
read i40 BE non-zero offset         10.16 ns/iter  98,414,577.5    (9.53 ns … 35.51 ns)   9.77 ns  15.46 ns  15.88 ns
read u48 LE non-zero offset          9.81 ns/iter 101,965,387.7    (9.53 ns … 16.37 ns)   9.77 ns  11.74 ns  13.61 ns
read u48 BE non-zero offset          9.86 ns/iter 101,442,731.4    (9.53 ns … 17.92 ns)   9.77 ns  13.43 ns  14.84 ns
read i48 LE non-zero offset          9.66 ns/iter 103,484,172.8    (9.41 ns … 17.15 ns)   9.57 ns  12.53 ns  14.29 ns
read i48 BE non-zero offset           9.8 ns/iter 102,023,504.7    (9.53 ns … 16.61 ns)   9.77 ns  12.63 ns  14.38 ns

Reading floating point numbers

Reading floating points is faster by 2-3 nanoseconds (15-20%), bringing the speed in line with integer reads.

This PR:

benchmark                           time (avg)        iter/s             (min … max)       p75       p99      p995
------------------------------------------------------------------------------------ -----------------------------
read float LE non-zero offset      10.15 ns/iter  98,492,448.8    (9.98 ns … 25.43 ns)  10.08 ns  11.76 ns   14.7 ns
read float BE non-zero offset      10.12 ns/iter  98,857,245.8   (10.01 ns … 27.07 ns)   10.1 ns  11.73 ns  12.41 ns
read double LE non-zero offset       9.98 ns/iter 100,227,692.1    (9.77 ns … 17.69 ns)   9.83 ns  15.48 ns  15.83 ns
read double BE non-zero offset      10.23 ns/iter  97,724,139.7    (9.77 ns … 21.27 ns)   9.84 ns  17.25 ns  17.58 ns

Main:

benchmark                           time (avg)        iter/s             (min … max)       p75       p99      p995
------------------------------------------------------------------------------------ -----------------------------
read float LE non-zero offset      12.32 ns/iter  81,186,069.2    (11.75 ns … 30.7 ns)  12.39 ns  17.28 ns  18.19 ns
read float BE non-zero offset      12.32 ns/iter  81,173,186.0   (11.74 ns … 30.96 ns)  12.39 ns  13.32 ns  13.96 ns
read double LE non-zero offset      12.64 ns/iter  79,089,114.7   (12.39 ns … 20.62 ns)   12.6 ns  17.74 ns  18.29 ns
read double BE non-zero offset       12.5 ns/iter  79,996,955.4   (12.39 ns … 20.59 ns)  12.39 ns  14.99 ns  16.37 ns

Reading BigInts

Reading signed 64-bit integers is significantly faster, about 33% or so. Reading unsigned integers is significantly slower by more than 100%.

This PR:

benchmark                           time (avg)        iter/s             (min … max)       p75       p99      p995
------------------------------------------------------------------------------------ -----------------------------
read u64 LE non-zero offset         25.03 ns/iter  39,947,827.6   (22.63 ns … 34.35 ns)  24.57 ns  31.28 ns  31.99 ns
read u64 BE non-zero offset         22.95 ns/iter  43,568,064.9   (20.81 ns … 33.38 ns)  23.24 ns  29.64 ns  29.85 ns
read i64 LE non-zero offset         21.37 ns/iter  46,790,864.8    (20.5 ns … 33.54 ns)  20.78 ns  27.85 ns  28.65 ns
read i64 BE non-zero offset         20.51 ns/iter  48,761,472.4   (19.78 ns … 57.56 ns)  19.84 ns  27.07 ns  27.58 ns

Main:

benchmark                           time (avg)        iter/s             (min … max)       p75       p99      p995
------------------------------------------------------------------------------------ -----------------------------
read u64 LE non-zero offset          9.59 ns/iter 104,324,029.6    (9.06 ns … 17.76 ns)   9.29 ns  16.49 ns  16.88 ns
read u64 BE non-zero offset          9.36 ns/iter 106,848,018.2    (9.06 ns … 16.08 ns)   9.32 ns  11.79 ns  14.03 ns
read i64 LE non-zero offset          32.3 ns/iter  30,962,568.5   (31.44 ns … 47.85 ns)  31.98 ns  40.05 ns   40.8 ns
read i64 BE non-zero offset         32.43 ns/iter  30,832,610.6   (31.68 ns … 46.09 ns)  32.07 ns  39.61 ns     40 ns

Writing values

Writing values is mostly equivalent in speed, but writing BigInts and floating point numbers is faster by 10-30%.

This PR:

benchmark                            time (avg)        iter/s             (min … max)       p75       p99      p995
------------------------------------------------------------------------------------- -----------------------------
write u8 non-zero offset             10.27 ns/iter  97,337,490.9   (10.01 ns … 29.43 ns)  10.25 ns  11.19 ns  11.87 ns
write i8 non-zero offset             10.27 ns/iter  97,416,669.4   (10.07 ns … 29.55 ns)  10.22 ns  11.02 ns  12.93 ns
write u16 LE non-zero offset         10.45 ns/iter  95,704,782.3   (10.35 ns … 14.56 ns)  10.48 ns     11 ns  11.18 ns
write u16 BE non-zero offset         10.91 ns/iter  91,638,842.8    (10.25 ns … 31.7 ns)  10.91 ns  11.47 ns  11.81 ns
write i16 LE non-zero offset         11.12 ns/iter  89,895,929.8    (10.56 ns … 21.7 ns)  10.92 ns  16.33 ns  17.18 ns
write i16 BE non-zero offset         11.04 ns/iter  90,544,662.0   (10.72 ns … 23.08 ns)  10.96 ns   15.5 ns  17.14 ns
write u24 LE non-zero offset         10.74 ns/iter  93,087,180.9   (10.49 ns … 17.18 ns)  10.72 ns  11.98 ns  13.23 ns
write u24 BE non-zero offset         10.74 ns/iter  93,126,589.8   (10.48 ns … 16.56 ns)  10.76 ns  12.44 ns   13.5 ns
write i24 LE non-zero offset         10.74 ns/iter  93,111,024.0   (10.52 ns … 18.52 ns)  10.72 ns  11.86 ns  12.49 ns
write i24 BE non-zero offset         10.81 ns/iter  92,493,301.9   (10.48 ns … 21.72 ns)  10.76 ns  13.38 ns  17.93 ns
write u32 LE non-zero offset          10.4 ns/iter  96,159,238.9   (10.25 ns … 17.32 ns)  10.39 ns  12.47 ns  16.39 ns
write u32 BE non-zero offset         10.47 ns/iter  95,469,436.4    (10.25 ns … 17.3 ns)  10.46 ns   11.9 ns  12.37 ns
write i32 LE non-zero offset         10.36 ns/iter  96,523,279.9   (10.25 ns … 14.03 ns)  10.39 ns  11.47 ns  12.22 ns
write i32 BE non-zero offset         10.44 ns/iter  95,787,841.4   (10.25 ns … 15.36 ns)  10.46 ns  11.31 ns  12.84 ns
write u40 LE non-zero offset          16.4 ns/iter  60,989,837.4   (16.32 ns … 23.22 ns)  16.32 ns  18.12 ns  18.86 ns
write u40 BE non-zero offset         16.91 ns/iter  59,119,700.8   (16.32 ns … 31.57 ns)  17.39 ns  24.83 ns  27.91 ns
write i40 LE non-zero offset         18.16 ns/iter  55,080,096.4    (18.1 ns … 21.69 ns)  18.11 ns  19.01 ns  19.44 ns
write i40 BE non-zero offset         18.65 ns/iter  53,623,209.6    (18.1 ns … 23.69 ns)  18.82 ns  20.25 ns  20.25 ns
write u48 LE non-zero offset         16.52 ns/iter  60,545,392.7   (16.32 ns … 32.78 ns)  16.32 ns  24.15 ns   25.5 ns
write u48 BE non-zero offset         16.37 ns/iter  61,100,467.9   (16.32 ns … 20.65 ns)  16.32 ns  17.09 ns  17.66 ns
write i48 LE non-zero offset         19.34 ns/iter  51,693,197.8   (18.82 ns … 24.46 ns)  20.25 ns  20.78 ns  21.16 ns
write i48 BE non-zero offset         18.87 ns/iter  53,000,896.8   (18.82 ns … 22.35 ns)  18.82 ns  19.62 ns  19.99 ns
write u64 LE non-zero offset        123.74 ns/iter   8,081,304.0 (109.67 ns … 140.39 ns) 127.06 ns 137.01 ns 138.87 ns
write u64 BE non-zero offset        123.27 ns/iter   8,112,561.4  (109.37 ns … 139.7 ns) 126.26 ns  135.5 ns 136.67 ns
write i64 LE non-zero offset         202.3 ns/iter   4,943,194.0 (194.01 ns … 226.05 ns) 206.02 ns 218.03 ns 219.62 ns
write i64 BE non-zero offset        203.23 ns/iter   4,920,459.3  (192.9 ns … 220.72 ns)  207.5 ns 218.17 ns  218.2 ns
write float LE non-zero offset        9.83 ns/iter 101,747,626.5    (9.77 ns … 13.62 ns)   9.85 ns  10.41 ns  10.86 ns
write float BE non-zero offset       10.01 ns/iter  99,948,359.7    (9.77 ns … 14.59 ns)  10.01 ns  10.55 ns  10.74 ns
write double LE non-zero offset      10.06 ns/iter  99,381,501.6    (9.85 ns … 19.18 ns)   9.97 ns  15.63 ns  16.07 ns
write double BE non-zero offset      10.05 ns/iter  99,494,774.1    (9.95 ns … 13.34 ns)  10.09 ns  10.59 ns  10.78 ns

Main:

benchmark                            time (avg)        iter/s             (min … max)       p75       p99      p995
------------------------------------------------------------------------------------- -----------------------------
write u8 non-zero offset             10.34 ns/iter  96,695,054.4    (9.97 ns … 41.91 ns)  10.18 ns  17.37 ns  18.28 ns
write i8 non-zero offset             10.21 ns/iter  97,963,814.8    (9.95 ns … 34.21 ns)  10.15 ns  12.13 ns  15.38 ns
write u16 LE non-zero offset         11.01 ns/iter  90,837,591.3   (10.66 ns … 28.85 ns)  11.04 ns   12.6 ns  13.64 ns
write u16 BE non-zero offset         11.06 ns/iter  90,416,177.5   (10.72 ns … 32.32 ns)  11.04 ns  11.66 ns  25.02 ns
write i16 LE non-zero offset         10.95 ns/iter  91,322,430.7   (10.74 ns … 20.01 ns)  10.95 ns  11.87 ns  12.36 ns
write i16 BE non-zero offset         10.74 ns/iter  93,141,189.0   (10.31 ns … 19.98 ns)  10.55 ns  17.48 ns  17.76 ns
write u24 LE non-zero offset         10.65 ns/iter  93,911,183.4   (10.49 ns … 16.68 ns)  10.63 ns  11.88 ns  12.42 ns
write u24 BE non-zero offset         10.65 ns/iter  93,867,321.8   (10.52 ns … 16.68 ns)  10.65 ns  11.31 ns  11.72 ns
write i24 LE non-zero offset         10.67 ns/iter  93,680,913.6   (10.54 ns … 13.26 ns)  10.75 ns   11.2 ns  11.28 ns
write i24 BE non-zero offset         10.66 ns/iter  93,831,171.4   (10.49 ns … 13.11 ns)  10.69 ns   11.3 ns  11.66 ns
write u32 LE non-zero offset         10.76 ns/iter  92,947,006.7   (10.59 ns … 17.33 ns)  10.72 ns  11.67 ns   12.5 ns
write u32 BE non-zero offset         10.75 ns/iter  92,983,545.6       (10.6 ns … 18 ns)  10.72 ns  11.53 ns  12.53 ns
write i32 LE non-zero offset         10.74 ns/iter  93,113,607.9   (10.56 ns … 17.89 ns)  10.72 ns  11.31 ns  11.66 ns
write i32 BE non-zero offset         10.74 ns/iter  93,109,369.7   (10.48 ns … 16.77 ns)  10.72 ns  11.36 ns   11.8 ns
write u40 LE non-zero offset         19.62 ns/iter  50,963,332.3   (17.39 ns … 26.05 ns)  19.48 ns  22.44 ns  22.44 ns
write u40 BE non-zero offset         19.41 ns/iter  51,511,903.1   (17.39 ns … 26.92 ns)  19.46 ns  22.44 ns  22.47 ns
write i40 LE non-zero offset         18.41 ns/iter  54,319,022.7   (18.34 ns … 28.79 ns)  18.34 ns  19.54 ns  19.95 ns
write i40 BE non-zero offset          18.4 ns/iter  54,358,750.5   (17.15 ns … 25.91 ns)  18.34 ns  19.43 ns  19.95 ns
write u48 LE non-zero offset         19.27 ns/iter  51,882,890.4   (18.02 ns … 26.63 ns)  19.18 ns  21.88 ns  21.91 ns
write u48 BE non-zero offset          19.4 ns/iter  51,540,337.5   (18.02 ns … 29.05 ns)  19.18 ns     25 ns  25.05 ns
write i48 LE non-zero offset          18.7 ns/iter  53,464,774.9   (17.41 ns … 31.61 ns)  18.58 ns  21.91 ns  23.83 ns
write i48 BE non-zero offset         28.23 ns/iter  35,422,951.2    (28.1 ns … 39.73 ns)  28.11 ns  30.43 ns  30.64 ns
write u64 LE non-zero offset        156.17 ns/iter   6,403,144.5 (147.57 ns … 175.35 ns) 160.62 ns 166.04 ns 167.47 ns
write u64 BE non-zero offset        152.79 ns/iter   6,544,968.0 (139.48 ns … 201.68 ns) 157.12 ns 173.33 ns  198.8 ns
write i64 LE non-zero offset        258.36 ns/iter   3,870,540.4 (244.96 ns … 379.12 ns) 259.76 ns 368.95 ns 369.44 ns
write i64 BE non-zero offset        258.05 ns/iter   3,875,267.2 (241.41 ns … 321.28 ns) 261.87 ns 268.66 ns 268.88 ns
write float LE non-zero offset       12.53 ns/iter  79,823,725.9    (12.15 ns … 28.6 ns)  12.51 ns  13.09 ns  13.28 ns
write float BE non-zero offset       12.52 ns/iter  79,841,979.0   (12.32 ns … 16.22 ns)  12.52 ns  13.25 ns  13.74 ns
write double LE non-zero offset      13.68 ns/iter  73,106,632.6   (13.44 ns … 35.19 ns)  13.69 ns  14.28 ns  14.53 ns
write double BE non-zero offset      13.73 ns/iter  72,841,178.5   (13.46 ns … 21.24 ns)  13.69 ns  15.55 ns  18.34 ns

V8 bugs spawned from this work

  1. DataView#getBigUint64 has lackluster performance: https://bugs.chromium.org/p/v8/issues/detail?id=14263

It should not be the case that manually reading the 8 bytes of a 64-bit unsigned integer one by one and adding them together with appropriate bitshifts or multiplications, including a conversion of upper and lower 4 bytes to BigInt and a final BigInt-bitshift and addition, is faster than simply having V8 read the 64-bit unsigned integer and create a BigInt out of that. This seems like a bug in the engine.

  1. Extending Uint8Array for a class base (or presumably any other TypedArray at least) leads to all of its methods losing performance: https://bugs.chromium.org/p/v8/issues/detail?id=14265

If Buffer were a class with an internal #buffer and #view then the performance of all its methods would be 10-30% better based on my testing. This applies even to methods that do not use this at all. This likewise seems like a bug, or possibly a limitation, of the engine.

@aapoalas aapoalas force-pushed the exp/ext-node/buffer-optimisation branch from 0762e00 to 76bd16b Compare February 19, 2023 16:13
@aapoalas aapoalas marked this pull request as ready for review February 19, 2023 18:11
@bartlomieju
Copy link
Member

Maybe let's wait on #17827 to land before merging this one.

@bartlomieju
Copy link
Member

@aapoalas it appears that some tests related to buffer started failing with these changes.

@aapoalas aapoalas force-pushed the exp/ext-node/buffer-optimisation branch from 13be2d7 to e74c58a Compare March 19, 2023 19:53
@aapoalas aapoalas changed the title Experiment: Try optimise Node Buffer polyfill refactor(ext/node): Try optimise Node Buffer polyfill Mar 19, 2023
@aapoalas aapoalas force-pushed the exp/ext-node/buffer-optimisation branch from ca6ebc1 to 53d1be6 Compare July 30, 2023 13:17
@aapoalas aapoalas force-pushed the exp/ext-node/buffer-optimisation branch from efd0922 to 29f34a8 Compare August 8, 2023 18:18
mmastrac pushed a commit that referenced this pull request Sep 7, 2023
Extracted from #17815

Optimise Buffer's string operations, most significantly when dealing
with ASCII and UTF-16. Base64 and HEX encodings are affected to much
lesser degrees.

## Performance

### String length 15
With very small strings we're at break-even or sometimes even lose a tad
bit of performance from creating a `DataView` that ends up not paying
for itself.

**This PR:**
```
benchmark                                                     time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------------------------------------- -----------------------------
Buffer.from ascii string                                       1.15 µs/iter     871,388.6   (728.78 ns … 1.56 µs)   1.23 µs   1.56 µs   1.56 µs
Buffer.from base64 string                                      1.63 µs/iter     612,790.9     (1.31 µs … 1.96 µs)   1.77 µs   1.96 µs   1.96 µs
Buffer.from utf16 string                                       1.41 µs/iter     707,396.3   (915.24 ns … 1.93 µs)   1.61 µs   1.93 µs   1.93 µs
Buffer.from hex string                                         1.87 µs/iter     535,357.9     (1.56 µs … 2.19 µs)      2 µs   2.19 µs   2.19 µs
Buffer.toString ascii string                                 154.58 ns/iter   6,469,162.8    (149.69 ns … 198 ns) 154.51 ns 182.89 ns 191.91 ns
Buffer.toString base64 string                                161.65 ns/iter   6,186,189.6 (150.91 ns … 181.15 ns) 165.18 ns 171.87 ns 174.94 ns
Buffer.toString utf16 string                                 292.74 ns/iter   3,415,959.8 (285.43 ns … 312.47 ns) 295.25 ns 310.47 ns 312.47 ns
Buffer.toString hex string                                    89.61 ns/iter  11,159,315.6  (81.09 ns … 123.77 ns)  91.09 ns 113.62 ns 119.28 ns
```

**Main:**
```
benchmark                                                     time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------------------------------------- -----------------------------
Buffer.from ascii string                                       1.26 µs/iter     794,875.8     (1.07 µs … 1.46 µs)   1.31 µs   1.46 µs   1.46 µs
Buffer.from base64 string                                      1.65 µs/iter     607,853.3     (1.38 µs … 2.01 µs)   1.69 µs   2.01 µs   2.01 µs
Buffer.from utf16 string                                       1.34 µs/iter     744,894.6     (1.09 µs … 1.55 µs)   1.45 µs   1.55 µs   1.55 µs
Buffer.from hex string                                         2.01 µs/iter     496,345.8      (1.54 µs … 2.6 µs)   2.26 µs    2.6 µs    2.6 µs
Buffer.toString ascii string                                 150.16 ns/iter   6,659,630.5 (144.99 ns … 166.68 ns)  152.4 ns 157.26 ns 159.14 ns
Buffer.toString base64 string                                164.73 ns/iter   6,070,692.0 (158.77 ns … 185.63 ns) 168.48 ns 175.74 ns 176.68 ns
Buffer.toString utf16 string                                 150.61 ns/iter   6,639,864.0  (148.2 ns … 168.29 ns) 150.93 ns 157.21 ns 168.15 ns
Buffer.toString hex string                                    94.21 ns/iter  10,614,972.9   (86.21 ns … 98.75 ns)  95.43 ns  97.99 ns  98.21 ns
```

### String length 1500
With moderate lengths we already see great upsides for `Buffer.from()`
with ASCII and UTF-16.

**This PR:**
```
benchmark                                                     time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------------------------------------- -----------------------------
Buffer.from ascii string                                       5.79 µs/iter     172,562.6     (4.72 µs … 4.71 ms)   5.04 µs   10.3 µs  11.67 µs
Buffer.from base64 string                                      5.08 µs/iter     196,678.9     (4.97 µs … 5.76 µs)   5.08 µs   5.76 µs   5.76 µs
Buffer.from utf16 string                                       9.68 µs/iter     103,316.5     (7.14 µs … 3.44 ms)  10.32 µs  13.42 µs  15.21 µs
Buffer.from hex string                                         53.7 µs/iter      18,620.2     (49.37 µs … 2.2 ms)  54.74 µs   72.2 µs  81.07 µs
Buffer.toString ascii string                                   6.63 µs/iter     150,761.3     (5.59 µs … 1.11 ms)   6.08 µs  15.68 µs  24.77 µs
Buffer.toString base64 string                                460.57 ns/iter   2,171,224.4 (448.33 ns … 511.73 ns) 465.05 ns 495.54 ns 511.73 ns
Buffer.toString utf16 string                                   6.52 µs/iter     153,287.0     (6.47 µs … 6.66 µs)   6.53 µs   6.66 µs   6.66 µs
Buffer.toString hex string                                     3.68 µs/iter     271,965.4     (3.64 µs … 3.82 µs)   3.68 µs   3.82 µs   3.82 µs
```

**Main:**
```
benchmark                                                     time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------------------------------------- -----------------------------
Buffer.from ascii string                                      11.46 µs/iter      87,298.1    (8.53 µs … 834.1 µs)   9.61 µs  83.31 µs   87.3 µs
Buffer.from base64 string                                       5.4 µs/iter     185,027.8     (5.07 µs … 7.49 µs)   5.44 µs   7.49 µs   7.49 µs
Buffer.from utf16 string                                       20.3 µs/iter      49,270.8  (13.55 µs … 649.11 µs)   18.8 µs 113.93 µs 125.17 µs
Buffer.from hex string                                        52.03 µs/iter      19,218.9    (48.74 µs … 2.59 ms)  52.84 µs  67.05 µs  73.56 µs
Buffer.toString ascii string                                   6.46 µs/iter     154,822.5     (6.32 µs … 6.69 µs)   6.52 µs   6.69 µs   6.69 µs
Buffer.toString base64 string                                440.19 ns/iter   2,271,764.6    (427 ns … 490.77 ns) 444.74 ns 484.64 ns 490.77 ns
Buffer.toString utf16 string                                   6.89 µs/iter     145,106.7     (6.81 µs … 7.24 µs)   6.91 µs   7.24 µs   7.24 µs
Buffer.toString hex string                                     3.66 µs/iter     273,456.5      (3.6 µs … 4.02 µs)   3.64 µs   4.02 µs   4.02 µs
```

### String length 2^20
With massive lengths we the difference in ASCII and UTF-16 parsing
performance is enormous.

**This PR:**
```
benchmark                                                           time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------------------------------------------- -----------------------------
Buffer.from ascii string                                              4.1 ms/iter         243.7     (2.64 ms … 6.74 ms)   4.43 ms   6.26 ms   6.74 ms
Buffer.from base64 string                                            3.74 ms/iter         267.6     (2.91 ms … 4.92 ms)   3.96 ms   4.31 ms   4.92 ms
Buffer.from utf16 string                                             7.72 ms/iter         129.5    (5.91 ms … 11.03 ms)   7.97 ms  11.03 ms  11.03 ms
Buffer.from hex string                                              35.72 ms/iter          28.0   (34.71 ms … 38.42 ms)  35.93 ms  38.42 ms  38.42 ms
Buffer.toString ascii string                                        78.92 ms/iter          12.7   (42.72 ms … 94.13 ms)  91.64 ms  94.13 ms  94.13 ms
Buffer.toString base64 string                                      833.62 µs/iter       1,199.6   (638.05 µs … 5.97 ms) 826.86 µs   2.45 ms   2.48 ms
Buffer.toString utf16 string                                        79.35 ms/iter          12.6    (69.72 ms … 88.9 ms)  86.66 ms   88.9 ms   88.9 ms
Buffer.toString hex string                                          31.04 ms/iter          32.2      (4.3 ms … 46.9 ms)  37.21 ms   46.9 ms   46.9 ms
```

**Main:**
```
benchmark                                                           time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------------------------------------------- -----------------------------
Buffer.from ascii string                                            18.66 ms/iter          53.6   (15.61 ms … 23.26 ms)  20.62 ms  23.26 ms  23.26 ms
Buffer.from base64 string                                             4.7 ms/iter         212.9     (2.94 ms … 9.07 ms)   4.65 ms   9.06 ms   9.07 ms
Buffer.from utf16 string                                            33.49 ms/iter          29.9   (31.24 ms … 35.67 ms)  34.08 ms  35.67 ms  35.67 ms
Buffer.from hex string                                              39.38 ms/iter          25.4   (38.66 ms … 42.36 ms)  39.58 ms  42.36 ms  42.36 ms
Buffer.toString ascii string                                        77.68 ms/iter          12.9   (67.46 ms … 95.68 ms)  84.71 ms  95.68 ms  95.68 ms
Buffer.toString base64 string                                      825.53 µs/iter       1,211.3   (655.38 µs … 6.69 ms) 816.62 µs   3.07 ms   3.13 ms
Buffer.toString utf16 string                                        76.54 ms/iter          13.1    (66.9 ms … 85.26 ms)  83.63 ms  85.26 ms  85.26 ms
Buffer.toString hex string                                          38.56 ms/iter          25.9   (33.83 ms … 46.56 ms)  45.33 ms  46.56 ms  46.56 ms
```
bartlomieju pushed a commit to bartlomieju/deno that referenced this pull request Sep 8, 2023
Extracted from denoland#17815

Optimise Buffer's string operations, most significantly when dealing
with ASCII and UTF-16. Base64 and HEX encodings are affected to much
lesser degrees.

## Performance

### String length 15
With very small strings we're at break-even or sometimes even lose a tad
bit of performance from creating a `DataView` that ends up not paying
for itself.

**This PR:**
```
benchmark                                                     time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------------------------------------- -----------------------------
Buffer.from ascii string                                       1.15 µs/iter     871,388.6   (728.78 ns … 1.56 µs)   1.23 µs   1.56 µs   1.56 µs
Buffer.from base64 string                                      1.63 µs/iter     612,790.9     (1.31 µs … 1.96 µs)   1.77 µs   1.96 µs   1.96 µs
Buffer.from utf16 string                                       1.41 µs/iter     707,396.3   (915.24 ns … 1.93 µs)   1.61 µs   1.93 µs   1.93 µs
Buffer.from hex string                                         1.87 µs/iter     535,357.9     (1.56 µs … 2.19 µs)      2 µs   2.19 µs   2.19 µs
Buffer.toString ascii string                                 154.58 ns/iter   6,469,162.8    (149.69 ns … 198 ns) 154.51 ns 182.89 ns 191.91 ns
Buffer.toString base64 string                                161.65 ns/iter   6,186,189.6 (150.91 ns … 181.15 ns) 165.18 ns 171.87 ns 174.94 ns
Buffer.toString utf16 string                                 292.74 ns/iter   3,415,959.8 (285.43 ns … 312.47 ns) 295.25 ns 310.47 ns 312.47 ns
Buffer.toString hex string                                    89.61 ns/iter  11,159,315.6  (81.09 ns … 123.77 ns)  91.09 ns 113.62 ns 119.28 ns
```

**Main:**
```
benchmark                                                     time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------------------------------------- -----------------------------
Buffer.from ascii string                                       1.26 µs/iter     794,875.8     (1.07 µs … 1.46 µs)   1.31 µs   1.46 µs   1.46 µs
Buffer.from base64 string                                      1.65 µs/iter     607,853.3     (1.38 µs … 2.01 µs)   1.69 µs   2.01 µs   2.01 µs
Buffer.from utf16 string                                       1.34 µs/iter     744,894.6     (1.09 µs … 1.55 µs)   1.45 µs   1.55 µs   1.55 µs
Buffer.from hex string                                         2.01 µs/iter     496,345.8      (1.54 µs … 2.6 µs)   2.26 µs    2.6 µs    2.6 µs
Buffer.toString ascii string                                 150.16 ns/iter   6,659,630.5 (144.99 ns … 166.68 ns)  152.4 ns 157.26 ns 159.14 ns
Buffer.toString base64 string                                164.73 ns/iter   6,070,692.0 (158.77 ns … 185.63 ns) 168.48 ns 175.74 ns 176.68 ns
Buffer.toString utf16 string                                 150.61 ns/iter   6,639,864.0  (148.2 ns … 168.29 ns) 150.93 ns 157.21 ns 168.15 ns
Buffer.toString hex string                                    94.21 ns/iter  10,614,972.9   (86.21 ns … 98.75 ns)  95.43 ns  97.99 ns  98.21 ns
```

### String length 1500
With moderate lengths we already see great upsides for `Buffer.from()`
with ASCII and UTF-16.

**This PR:**
```
benchmark                                                     time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------------------------------------- -----------------------------
Buffer.from ascii string                                       5.79 µs/iter     172,562.6     (4.72 µs … 4.71 ms)   5.04 µs   10.3 µs  11.67 µs
Buffer.from base64 string                                      5.08 µs/iter     196,678.9     (4.97 µs … 5.76 µs)   5.08 µs   5.76 µs   5.76 µs
Buffer.from utf16 string                                       9.68 µs/iter     103,316.5     (7.14 µs … 3.44 ms)  10.32 µs  13.42 µs  15.21 µs
Buffer.from hex string                                         53.7 µs/iter      18,620.2     (49.37 µs … 2.2 ms)  54.74 µs   72.2 µs  81.07 µs
Buffer.toString ascii string                                   6.63 µs/iter     150,761.3     (5.59 µs … 1.11 ms)   6.08 µs  15.68 µs  24.77 µs
Buffer.toString base64 string                                460.57 ns/iter   2,171,224.4 (448.33 ns … 511.73 ns) 465.05 ns 495.54 ns 511.73 ns
Buffer.toString utf16 string                                   6.52 µs/iter     153,287.0     (6.47 µs … 6.66 µs)   6.53 µs   6.66 µs   6.66 µs
Buffer.toString hex string                                     3.68 µs/iter     271,965.4     (3.64 µs … 3.82 µs)   3.68 µs   3.82 µs   3.82 µs
```

**Main:**
```
benchmark                                                     time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------------------------------------- -----------------------------
Buffer.from ascii string                                      11.46 µs/iter      87,298.1    (8.53 µs … 834.1 µs)   9.61 µs  83.31 µs   87.3 µs
Buffer.from base64 string                                       5.4 µs/iter     185,027.8     (5.07 µs … 7.49 µs)   5.44 µs   7.49 µs   7.49 µs
Buffer.from utf16 string                                       20.3 µs/iter      49,270.8  (13.55 µs … 649.11 µs)   18.8 µs 113.93 µs 125.17 µs
Buffer.from hex string                                        52.03 µs/iter      19,218.9    (48.74 µs … 2.59 ms)  52.84 µs  67.05 µs  73.56 µs
Buffer.toString ascii string                                   6.46 µs/iter     154,822.5     (6.32 µs … 6.69 µs)   6.52 µs   6.69 µs   6.69 µs
Buffer.toString base64 string                                440.19 ns/iter   2,271,764.6    (427 ns … 490.77 ns) 444.74 ns 484.64 ns 490.77 ns
Buffer.toString utf16 string                                   6.89 µs/iter     145,106.7     (6.81 µs … 7.24 µs)   6.91 µs   7.24 µs   7.24 µs
Buffer.toString hex string                                     3.66 µs/iter     273,456.5      (3.6 µs … 4.02 µs)   3.64 µs   4.02 µs   4.02 µs
```

### String length 2^20
With massive lengths we the difference in ASCII and UTF-16 parsing
performance is enormous.

**This PR:**
```
benchmark                                                           time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------------------------------------------- -----------------------------
Buffer.from ascii string                                              4.1 ms/iter         243.7     (2.64 ms … 6.74 ms)   4.43 ms   6.26 ms   6.74 ms
Buffer.from base64 string                                            3.74 ms/iter         267.6     (2.91 ms … 4.92 ms)   3.96 ms   4.31 ms   4.92 ms
Buffer.from utf16 string                                             7.72 ms/iter         129.5    (5.91 ms … 11.03 ms)   7.97 ms  11.03 ms  11.03 ms
Buffer.from hex string                                              35.72 ms/iter          28.0   (34.71 ms … 38.42 ms)  35.93 ms  38.42 ms  38.42 ms
Buffer.toString ascii string                                        78.92 ms/iter          12.7   (42.72 ms … 94.13 ms)  91.64 ms  94.13 ms  94.13 ms
Buffer.toString base64 string                                      833.62 µs/iter       1,199.6   (638.05 µs … 5.97 ms) 826.86 µs   2.45 ms   2.48 ms
Buffer.toString utf16 string                                        79.35 ms/iter          12.6    (69.72 ms … 88.9 ms)  86.66 ms   88.9 ms   88.9 ms
Buffer.toString hex string                                          31.04 ms/iter          32.2      (4.3 ms … 46.9 ms)  37.21 ms   46.9 ms   46.9 ms
```

**Main:**
```
benchmark                                                           time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------------------------------------------- -----------------------------
Buffer.from ascii string                                            18.66 ms/iter          53.6   (15.61 ms … 23.26 ms)  20.62 ms  23.26 ms  23.26 ms
Buffer.from base64 string                                             4.7 ms/iter         212.9     (2.94 ms … 9.07 ms)   4.65 ms   9.06 ms   9.07 ms
Buffer.from utf16 string                                            33.49 ms/iter          29.9   (31.24 ms … 35.67 ms)  34.08 ms  35.67 ms  35.67 ms
Buffer.from hex string                                              39.38 ms/iter          25.4   (38.66 ms … 42.36 ms)  39.58 ms  42.36 ms  42.36 ms
Buffer.toString ascii string                                        77.68 ms/iter          12.9   (67.46 ms … 95.68 ms)  84.71 ms  95.68 ms  95.68 ms
Buffer.toString base64 string                                      825.53 µs/iter       1,211.3   (655.38 µs … 6.69 ms) 816.62 µs   3.07 ms   3.13 ms
Buffer.toString utf16 string                                        76.54 ms/iter          13.1    (66.9 ms … 85.26 ms)  83.63 ms  85.26 ms  85.26 ms
Buffer.toString hex string                                          38.56 ms/iter          25.9   (33.83 ms … 46.56 ms)  45.33 ms  46.56 ms  46.56 ms
```
This way Buffer polyfill doesn't need to reimplement the
methods for reading data from an ArrayBuffer
backing store. Instead the DataView APIs are used that then
internally call V8 implementations of these algorithms.

This reduces the amount of code needed for the Buffer polyfill
significantly. It also provides performance benefits especially
for reading floating point numbers.
@aapoalas aapoalas force-pushed the exp/ext-node/buffer-optimisation branch from cada329 to 030fe87 Compare October 28, 2023 15:42
@aapoalas aapoalas changed the title refactor(ext/node): Try optimise Node Buffer polyfill refactor(ext/node): Use an internal DataView in Buffer Oct 28, 2023
@aapoalas
Copy link
Collaborator Author

aapoalas commented Nov 3, 2024

I have decided that this is not a good idea: Adding something like 80 bytes to an already fairly large static size of Buffer AKA Uint8Array plus the internal ArrayBuffer, for what is a fairly small performance upside (and even some downsides, especially in the more necessary parts).

The proper way to improve the performance of Buffer read and write APIs would be to implement them in Rust.

@aapoalas aapoalas closed this Nov 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants