-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(ext/node): Use an internal DataView in Buffer #17815
Conversation
0762e00
to
76bd16b
Compare
Maybe let's wait on #17827 to land before merging this one. |
@aapoalas it appears that some tests related to buffer started failing with these changes. |
13be2d7
to
e74c58a
Compare
ca6ebc1
to
53d1be6
Compare
efd0922
to
29f34a8
Compare
Extracted from #17815 Optimise Buffer's string operations, most significantly when dealing with ASCII and UTF-16. Base64 and HEX encodings are affected to much lesser degrees. ## Performance ### String length 15 With very small strings we're at break-even or sometimes even lose a tad bit of performance from creating a `DataView` that ends up not paying for itself. **This PR:** ``` benchmark time (avg) iter/s (min … max) p75 p99 p995 -------------------------------------------------------------------------------------------------------------- ----------------------------- Buffer.from ascii string 1.15 µs/iter 871,388.6 (728.78 ns … 1.56 µs) 1.23 µs 1.56 µs 1.56 µs Buffer.from base64 string 1.63 µs/iter 612,790.9 (1.31 µs … 1.96 µs) 1.77 µs 1.96 µs 1.96 µs Buffer.from utf16 string 1.41 µs/iter 707,396.3 (915.24 ns … 1.93 µs) 1.61 µs 1.93 µs 1.93 µs Buffer.from hex string 1.87 µs/iter 535,357.9 (1.56 µs … 2.19 µs) 2 µs 2.19 µs 2.19 µs Buffer.toString ascii string 154.58 ns/iter 6,469,162.8 (149.69 ns … 198 ns) 154.51 ns 182.89 ns 191.91 ns Buffer.toString base64 string 161.65 ns/iter 6,186,189.6 (150.91 ns … 181.15 ns) 165.18 ns 171.87 ns 174.94 ns Buffer.toString utf16 string 292.74 ns/iter 3,415,959.8 (285.43 ns … 312.47 ns) 295.25 ns 310.47 ns 312.47 ns Buffer.toString hex string 89.61 ns/iter 11,159,315.6 (81.09 ns … 123.77 ns) 91.09 ns 113.62 ns 119.28 ns ``` **Main:** ``` benchmark time (avg) iter/s (min … max) p75 p99 p995 -------------------------------------------------------------------------------------------------------------- ----------------------------- Buffer.from ascii string 1.26 µs/iter 794,875.8 (1.07 µs … 1.46 µs) 1.31 µs 1.46 µs 1.46 µs Buffer.from base64 string 1.65 µs/iter 607,853.3 (1.38 µs … 2.01 µs) 1.69 µs 2.01 µs 2.01 µs Buffer.from utf16 string 1.34 µs/iter 744,894.6 (1.09 µs … 1.55 µs) 1.45 µs 1.55 µs 1.55 µs Buffer.from hex string 2.01 µs/iter 496,345.8 (1.54 µs … 2.6 µs) 2.26 µs 2.6 µs 2.6 µs Buffer.toString ascii string 150.16 ns/iter 6,659,630.5 (144.99 ns … 166.68 ns) 152.4 ns 157.26 ns 159.14 ns Buffer.toString base64 string 164.73 ns/iter 6,070,692.0 (158.77 ns … 185.63 ns) 168.48 ns 175.74 ns 176.68 ns Buffer.toString utf16 string 150.61 ns/iter 6,639,864.0 (148.2 ns … 168.29 ns) 150.93 ns 157.21 ns 168.15 ns Buffer.toString hex string 94.21 ns/iter 10,614,972.9 (86.21 ns … 98.75 ns) 95.43 ns 97.99 ns 98.21 ns ``` ### String length 1500 With moderate lengths we already see great upsides for `Buffer.from()` with ASCII and UTF-16. **This PR:** ``` benchmark time (avg) iter/s (min … max) p75 p99 p995 -------------------------------------------------------------------------------------------------------------- ----------------------------- Buffer.from ascii string 5.79 µs/iter 172,562.6 (4.72 µs … 4.71 ms) 5.04 µs 10.3 µs 11.67 µs Buffer.from base64 string 5.08 µs/iter 196,678.9 (4.97 µs … 5.76 µs) 5.08 µs 5.76 µs 5.76 µs Buffer.from utf16 string 9.68 µs/iter 103,316.5 (7.14 µs … 3.44 ms) 10.32 µs 13.42 µs 15.21 µs Buffer.from hex string 53.7 µs/iter 18,620.2 (49.37 µs … 2.2 ms) 54.74 µs 72.2 µs 81.07 µs Buffer.toString ascii string 6.63 µs/iter 150,761.3 (5.59 µs … 1.11 ms) 6.08 µs 15.68 µs 24.77 µs Buffer.toString base64 string 460.57 ns/iter 2,171,224.4 (448.33 ns … 511.73 ns) 465.05 ns 495.54 ns 511.73 ns Buffer.toString utf16 string 6.52 µs/iter 153,287.0 (6.47 µs … 6.66 µs) 6.53 µs 6.66 µs 6.66 µs Buffer.toString hex string 3.68 µs/iter 271,965.4 (3.64 µs … 3.82 µs) 3.68 µs 3.82 µs 3.82 µs ``` **Main:** ``` benchmark time (avg) iter/s (min … max) p75 p99 p995 -------------------------------------------------------------------------------------------------------------- ----------------------------- Buffer.from ascii string 11.46 µs/iter 87,298.1 (8.53 µs … 834.1 µs) 9.61 µs 83.31 µs 87.3 µs Buffer.from base64 string 5.4 µs/iter 185,027.8 (5.07 µs … 7.49 µs) 5.44 µs 7.49 µs 7.49 µs Buffer.from utf16 string 20.3 µs/iter 49,270.8 (13.55 µs … 649.11 µs) 18.8 µs 113.93 µs 125.17 µs Buffer.from hex string 52.03 µs/iter 19,218.9 (48.74 µs … 2.59 ms) 52.84 µs 67.05 µs 73.56 µs Buffer.toString ascii string 6.46 µs/iter 154,822.5 (6.32 µs … 6.69 µs) 6.52 µs 6.69 µs 6.69 µs Buffer.toString base64 string 440.19 ns/iter 2,271,764.6 (427 ns … 490.77 ns) 444.74 ns 484.64 ns 490.77 ns Buffer.toString utf16 string 6.89 µs/iter 145,106.7 (6.81 µs … 7.24 µs) 6.91 µs 7.24 µs 7.24 µs Buffer.toString hex string 3.66 µs/iter 273,456.5 (3.6 µs … 4.02 µs) 3.64 µs 4.02 µs 4.02 µs ``` ### String length 2^20 With massive lengths we the difference in ASCII and UTF-16 parsing performance is enormous. **This PR:** ``` benchmark time (avg) iter/s (min … max) p75 p99 p995 -------------------------------------------------------------------------------------------------------------------- ----------------------------- Buffer.from ascii string 4.1 ms/iter 243.7 (2.64 ms … 6.74 ms) 4.43 ms 6.26 ms 6.74 ms Buffer.from base64 string 3.74 ms/iter 267.6 (2.91 ms … 4.92 ms) 3.96 ms 4.31 ms 4.92 ms Buffer.from utf16 string 7.72 ms/iter 129.5 (5.91 ms … 11.03 ms) 7.97 ms 11.03 ms 11.03 ms Buffer.from hex string 35.72 ms/iter 28.0 (34.71 ms … 38.42 ms) 35.93 ms 38.42 ms 38.42 ms Buffer.toString ascii string 78.92 ms/iter 12.7 (42.72 ms … 94.13 ms) 91.64 ms 94.13 ms 94.13 ms Buffer.toString base64 string 833.62 µs/iter 1,199.6 (638.05 µs … 5.97 ms) 826.86 µs 2.45 ms 2.48 ms Buffer.toString utf16 string 79.35 ms/iter 12.6 (69.72 ms … 88.9 ms) 86.66 ms 88.9 ms 88.9 ms Buffer.toString hex string 31.04 ms/iter 32.2 (4.3 ms … 46.9 ms) 37.21 ms 46.9 ms 46.9 ms ``` **Main:** ``` benchmark time (avg) iter/s (min … max) p75 p99 p995 -------------------------------------------------------------------------------------------------------------------- ----------------------------- Buffer.from ascii string 18.66 ms/iter 53.6 (15.61 ms … 23.26 ms) 20.62 ms 23.26 ms 23.26 ms Buffer.from base64 string 4.7 ms/iter 212.9 (2.94 ms … 9.07 ms) 4.65 ms 9.06 ms 9.07 ms Buffer.from utf16 string 33.49 ms/iter 29.9 (31.24 ms … 35.67 ms) 34.08 ms 35.67 ms 35.67 ms Buffer.from hex string 39.38 ms/iter 25.4 (38.66 ms … 42.36 ms) 39.58 ms 42.36 ms 42.36 ms Buffer.toString ascii string 77.68 ms/iter 12.9 (67.46 ms … 95.68 ms) 84.71 ms 95.68 ms 95.68 ms Buffer.toString base64 string 825.53 µs/iter 1,211.3 (655.38 µs … 6.69 ms) 816.62 µs 3.07 ms 3.13 ms Buffer.toString utf16 string 76.54 ms/iter 13.1 (66.9 ms … 85.26 ms) 83.63 ms 85.26 ms 85.26 ms Buffer.toString hex string 38.56 ms/iter 25.9 (33.83 ms … 46.56 ms) 45.33 ms 46.56 ms 46.56 ms ```
Extracted from denoland#17815 Optimise Buffer's string operations, most significantly when dealing with ASCII and UTF-16. Base64 and HEX encodings are affected to much lesser degrees. ## Performance ### String length 15 With very small strings we're at break-even or sometimes even lose a tad bit of performance from creating a `DataView` that ends up not paying for itself. **This PR:** ``` benchmark time (avg) iter/s (min … max) p75 p99 p995 -------------------------------------------------------------------------------------------------------------- ----------------------------- Buffer.from ascii string 1.15 µs/iter 871,388.6 (728.78 ns … 1.56 µs) 1.23 µs 1.56 µs 1.56 µs Buffer.from base64 string 1.63 µs/iter 612,790.9 (1.31 µs … 1.96 µs) 1.77 µs 1.96 µs 1.96 µs Buffer.from utf16 string 1.41 µs/iter 707,396.3 (915.24 ns … 1.93 µs) 1.61 µs 1.93 µs 1.93 µs Buffer.from hex string 1.87 µs/iter 535,357.9 (1.56 µs … 2.19 µs) 2 µs 2.19 µs 2.19 µs Buffer.toString ascii string 154.58 ns/iter 6,469,162.8 (149.69 ns … 198 ns) 154.51 ns 182.89 ns 191.91 ns Buffer.toString base64 string 161.65 ns/iter 6,186,189.6 (150.91 ns … 181.15 ns) 165.18 ns 171.87 ns 174.94 ns Buffer.toString utf16 string 292.74 ns/iter 3,415,959.8 (285.43 ns … 312.47 ns) 295.25 ns 310.47 ns 312.47 ns Buffer.toString hex string 89.61 ns/iter 11,159,315.6 (81.09 ns … 123.77 ns) 91.09 ns 113.62 ns 119.28 ns ``` **Main:** ``` benchmark time (avg) iter/s (min … max) p75 p99 p995 -------------------------------------------------------------------------------------------------------------- ----------------------------- Buffer.from ascii string 1.26 µs/iter 794,875.8 (1.07 µs … 1.46 µs) 1.31 µs 1.46 µs 1.46 µs Buffer.from base64 string 1.65 µs/iter 607,853.3 (1.38 µs … 2.01 µs) 1.69 µs 2.01 µs 2.01 µs Buffer.from utf16 string 1.34 µs/iter 744,894.6 (1.09 µs … 1.55 µs) 1.45 µs 1.55 µs 1.55 µs Buffer.from hex string 2.01 µs/iter 496,345.8 (1.54 µs … 2.6 µs) 2.26 µs 2.6 µs 2.6 µs Buffer.toString ascii string 150.16 ns/iter 6,659,630.5 (144.99 ns … 166.68 ns) 152.4 ns 157.26 ns 159.14 ns Buffer.toString base64 string 164.73 ns/iter 6,070,692.0 (158.77 ns … 185.63 ns) 168.48 ns 175.74 ns 176.68 ns Buffer.toString utf16 string 150.61 ns/iter 6,639,864.0 (148.2 ns … 168.29 ns) 150.93 ns 157.21 ns 168.15 ns Buffer.toString hex string 94.21 ns/iter 10,614,972.9 (86.21 ns … 98.75 ns) 95.43 ns 97.99 ns 98.21 ns ``` ### String length 1500 With moderate lengths we already see great upsides for `Buffer.from()` with ASCII and UTF-16. **This PR:** ``` benchmark time (avg) iter/s (min … max) p75 p99 p995 -------------------------------------------------------------------------------------------------------------- ----------------------------- Buffer.from ascii string 5.79 µs/iter 172,562.6 (4.72 µs … 4.71 ms) 5.04 µs 10.3 µs 11.67 µs Buffer.from base64 string 5.08 µs/iter 196,678.9 (4.97 µs … 5.76 µs) 5.08 µs 5.76 µs 5.76 µs Buffer.from utf16 string 9.68 µs/iter 103,316.5 (7.14 µs … 3.44 ms) 10.32 µs 13.42 µs 15.21 µs Buffer.from hex string 53.7 µs/iter 18,620.2 (49.37 µs … 2.2 ms) 54.74 µs 72.2 µs 81.07 µs Buffer.toString ascii string 6.63 µs/iter 150,761.3 (5.59 µs … 1.11 ms) 6.08 µs 15.68 µs 24.77 µs Buffer.toString base64 string 460.57 ns/iter 2,171,224.4 (448.33 ns … 511.73 ns) 465.05 ns 495.54 ns 511.73 ns Buffer.toString utf16 string 6.52 µs/iter 153,287.0 (6.47 µs … 6.66 µs) 6.53 µs 6.66 µs 6.66 µs Buffer.toString hex string 3.68 µs/iter 271,965.4 (3.64 µs … 3.82 µs) 3.68 µs 3.82 µs 3.82 µs ``` **Main:** ``` benchmark time (avg) iter/s (min … max) p75 p99 p995 -------------------------------------------------------------------------------------------------------------- ----------------------------- Buffer.from ascii string 11.46 µs/iter 87,298.1 (8.53 µs … 834.1 µs) 9.61 µs 83.31 µs 87.3 µs Buffer.from base64 string 5.4 µs/iter 185,027.8 (5.07 µs … 7.49 µs) 5.44 µs 7.49 µs 7.49 µs Buffer.from utf16 string 20.3 µs/iter 49,270.8 (13.55 µs … 649.11 µs) 18.8 µs 113.93 µs 125.17 µs Buffer.from hex string 52.03 µs/iter 19,218.9 (48.74 µs … 2.59 ms) 52.84 µs 67.05 µs 73.56 µs Buffer.toString ascii string 6.46 µs/iter 154,822.5 (6.32 µs … 6.69 µs) 6.52 µs 6.69 µs 6.69 µs Buffer.toString base64 string 440.19 ns/iter 2,271,764.6 (427 ns … 490.77 ns) 444.74 ns 484.64 ns 490.77 ns Buffer.toString utf16 string 6.89 µs/iter 145,106.7 (6.81 µs … 7.24 µs) 6.91 µs 7.24 µs 7.24 µs Buffer.toString hex string 3.66 µs/iter 273,456.5 (3.6 µs … 4.02 µs) 3.64 µs 4.02 µs 4.02 µs ``` ### String length 2^20 With massive lengths we the difference in ASCII and UTF-16 parsing performance is enormous. **This PR:** ``` benchmark time (avg) iter/s (min … max) p75 p99 p995 -------------------------------------------------------------------------------------------------------------------- ----------------------------- Buffer.from ascii string 4.1 ms/iter 243.7 (2.64 ms … 6.74 ms) 4.43 ms 6.26 ms 6.74 ms Buffer.from base64 string 3.74 ms/iter 267.6 (2.91 ms … 4.92 ms) 3.96 ms 4.31 ms 4.92 ms Buffer.from utf16 string 7.72 ms/iter 129.5 (5.91 ms … 11.03 ms) 7.97 ms 11.03 ms 11.03 ms Buffer.from hex string 35.72 ms/iter 28.0 (34.71 ms … 38.42 ms) 35.93 ms 38.42 ms 38.42 ms Buffer.toString ascii string 78.92 ms/iter 12.7 (42.72 ms … 94.13 ms) 91.64 ms 94.13 ms 94.13 ms Buffer.toString base64 string 833.62 µs/iter 1,199.6 (638.05 µs … 5.97 ms) 826.86 µs 2.45 ms 2.48 ms Buffer.toString utf16 string 79.35 ms/iter 12.6 (69.72 ms … 88.9 ms) 86.66 ms 88.9 ms 88.9 ms Buffer.toString hex string 31.04 ms/iter 32.2 (4.3 ms … 46.9 ms) 37.21 ms 46.9 ms 46.9 ms ``` **Main:** ``` benchmark time (avg) iter/s (min … max) p75 p99 p995 -------------------------------------------------------------------------------------------------------------------- ----------------------------- Buffer.from ascii string 18.66 ms/iter 53.6 (15.61 ms … 23.26 ms) 20.62 ms 23.26 ms 23.26 ms Buffer.from base64 string 4.7 ms/iter 212.9 (2.94 ms … 9.07 ms) 4.65 ms 9.06 ms 9.07 ms Buffer.from utf16 string 33.49 ms/iter 29.9 (31.24 ms … 35.67 ms) 34.08 ms 35.67 ms 35.67 ms Buffer.from hex string 39.38 ms/iter 25.4 (38.66 ms … 42.36 ms) 39.58 ms 42.36 ms 42.36 ms Buffer.toString ascii string 77.68 ms/iter 12.9 (67.46 ms … 95.68 ms) 84.71 ms 95.68 ms 95.68 ms Buffer.toString base64 string 825.53 µs/iter 1,211.3 (655.38 µs … 6.69 ms) 816.62 µs 3.07 ms 3.13 ms Buffer.toString utf16 string 76.54 ms/iter 13.1 (66.9 ms … 85.26 ms) 83.63 ms 85.26 ms 85.26 ms Buffer.toString hex string 38.56 ms/iter 25.9 (33.83 ms … 46.56 ms) 45.33 ms 46.56 ms 46.56 ms ```
This way Buffer polyfill doesn't need to reimplement the methods for reading data from an ArrayBuffer backing store. Instead the DataView APIs are used that then internally call V8 implementations of these algorithms. This reduces the amount of code needed for the Buffer polyfill significantly. It also provides performance benefits especially for reading floating point numbers.
cada329
to
030fe87
Compare
I have decided that this is not a good idea: Adding something like 80 bytes to an already fairly large static size of The proper way to improve the performance of |
Refactor the node
Buffer
class polyfill to take less code lines by using an internalDataView
for reading. This improves performance in some places (reading signed bigints and floating point numbers, writing bigints and floating point numbers) and keeps the same performance roughly the same in others at the cost of increased creation time.Measurements
Creation
Creation of the internal
DataView
requires quite a bit of extra work, making the creation ofBuffer
significantly slower, especially so for empty buffers.It turns out creating an empty
Uint8Array
is optimized somehow in that it doesn't truly create anArrayBuffer
for theUint8Array
. Accessing thebuffer
property triggers the creation which then takes more than 400 nanoseconds. This is the difference between creation of an empty buffer in main vs in this PR.This PR:
Main:
Reading integers
Reading integers is equivalent in performance or just the tiniest bit slower, less than a nanosecond of a difference.
This PR:
Main:
Reading floating point numbers
Reading floating points is faster by 2-3 nanoseconds (15-20%), bringing the speed in line with integer reads.
This PR:
Main:
Reading BigInts
Reading signed 64-bit integers is significantly faster, about 33% or so. Reading unsigned integers is significantly slower by more than 100%.
This PR:
Main:
Writing values
Writing values is mostly equivalent in speed, but writing BigInts and floating point numbers is faster by 10-30%.
This PR:
Main:
V8 bugs spawned from this work
DataView#getBigUint64
has lackluster performance: https://bugs.chromium.org/p/v8/issues/detail?id=14263It should not be the case that manually reading the 8 bytes of a 64-bit unsigned integer one by one and adding them together with appropriate bitshifts or multiplications, including a conversion of upper and lower 4 bytes to BigInt and a final BigInt-bitshift and addition, is faster than simply having V8 read the 64-bit unsigned integer and create a BigInt out of that. This seems like a bug in the engine.
Uint8Array
for a class base (or presumably any other TypedArray at least) leads to all of its methods losing performance: https://bugs.chromium.org/p/v8/issues/detail?id=14265If
Buffer
were a class with an internal#buffer
and#view
then the performance of all its methods would be 10-30% better based on my testing. This applies even to methods that do not usethis
at all. This likewise seems like a bug, or possibly a limitation, of the engine.