Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer.toString vs TextDecoder #18

Closed
ronag opened this issue Nov 17, 2022 · 11 comments
Closed

Buffer.toString vs TextDecoder #18

ronag opened this issue Nov 17, 2022 · 11 comments
Labels
good first issue Good for newcomers

Comments

@ronag
Copy link
Member

ronag commented Nov 17, 2022

import cronometro from 'cronometro'
import assert from 'node:assert'

const STR = 'a'.repeat(128)
const BUF = Buffer.from(STR)
const decoder = new TextDecoder()

cronometro({
  buffer() {
    const tmp = BUF.toString('utf-8', 10)
    assert(tmp.length)
  },
  decoder() {
    const tmp = decoder.decode(BUF.subarray(10))
    assert(tmp.length)
  },
})
╔══════════════╤═════════╤═══════════════════╤═══════════╗
║ Slower tests │ Samples │            Result │ Tolerance ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ decoder      │   10000 │ 1431770.62 op/sec │ ± 21.01 % ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ Fastest test │ Samples │            Result │ Tolerance ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ buffer       │   10000 │ 4121403.35 op/sec │ ±  1.95 % ║
╚══════════════╧═════════╧═══════════════════╧═══════════╝
@anonrig
Copy link
Member

anonrig commented Nov 17, 2022

Can you run the decode and toString with large strings? It's not ideal to pass such small buffers to decoders

@anonrig anonrig changed the title Bufer.toString vs TextDecoder Buffer.toString vs TextDecoder Nov 17, 2022
@ronag
Copy link
Member Author

ronag commented Nov 17, 2022

const STR = 'a'.repeat(1024)

╔══════════════╤═════════╤═══════════════════╤═══════════╗
║ Slower tests │ Samples │            Result │ Tolerance ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ decoder      │    1500 │  910307.41 op/sec │ ±  0.86 % ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ Fastest test │ Samples │            Result │ Tolerance ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ buffer       │   10000 │ 2367063.92 op/sec │ ± 20.15 % ║
╚══════════════╧═════════╧═══════════════════╧═══════════╝

Node 19.1

@anonrig
Copy link
Member

anonrig commented Nov 17, 2022

The correct comparison is this. Since encoder.encode receives Uint8Array, not a FastBuffer instance.

import cronometro from 'cronometro'
import assert from 'node:assert'

const decoder = new TextDecoder()
const encoder = new TextEncoder()

const STR = 'a'.repeat(1024)
const ENCODED_STR = encoder.encode(STR)
const BUF = Buffer.from(STR)

cronometro({
  buffer() {
    const tmp = BUF.toString('utf-8')
    assert(tmp.length)
  },
  decoder() {
    const tmp = decoder.decode(ENCODED_STR)
    assert(tmp.length)
  },
})

Output (for large inputs)

╔══════════════╤═════════╤═══════════════════╤═══════════╗
║ Slower tests │ Samples │            Result │ Tolerance ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ buffer       │   10000 │ 2940680.88 op/sec │ ± 23.89 % ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ Fastest test │ Samples │            Result │ Tolerance ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ decoder      │   10000 │ 3766068.40 op/sec │ ± 11.48 % ║
╚══════════════╧═════════╧═══════════════════╧═══════════╝

Output (for smaller inputs)

╔══════════════╤═════════╤═══════════════════╤═══════════╗
║ Slower tests │ Samples │            Result │ Tolerance ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ decoder      │   10000 │ 3653280.32 op/sec │ ± 32.44 % ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ Fastest test │ Samples │            Result │ Tolerance ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ buffer       │   10000 │ 6615624.92 op/sec │ ±  7.24 % ║
╚══════════════╧═════════╧═══════════════════╧═══════════╝

@anonrig
Copy link
Member

anonrig commented Nov 17, 2022

Additionally, Node 19.1 does not have the following commit: nodejs/node@4ac830e. I'm running my benchmarks on the main branch of Node.

@anonrig
Copy link
Member

anonrig commented Nov 21, 2022

Should we remove this from the agenda, or do you think it's worth a talk @ronag?

@ronag
Copy link
Member Author

ronag commented Nov 22, 2022

Should we remove this from the agenda, or do you think it's worth a talk @ronag?

I mean for small strings we are still significantly slower?

@ronag
Copy link
Member Author

ronag commented Jan 9, 2023

It seems like there is an opportunity here to also improve the performance of Buffer.toString.

@anonrig
Copy link
Member

anonrig commented Jan 23, 2023

In the last performance meeting, we talked about updating documentation for preferring TextDecoder over buffer.toString for utf-8 values. If anybody is happy to open an issue to update Node documentation, that will be greatly appreciated.

@anonrig anonrig added good first issue Good for newcomers and removed performance-agenda labels Jan 23, 2023
@mcollina
Copy link
Member

I think it would be better to make Buffer.toString as fast as TextDecoder.

@tniessen
Copy link
Member

It seems like there is an opportunity here to also improve the performance of Buffer.toString.

@ronag Could you post your benchmark?

I copied @anonrig's script. The results posted in #18 (comment) showed way too much variance to be significant, so I increased the number of iterations. I now barely see any difference between the two tests in node 19.3.0.

1 MiB buffer:

tniessen@ubuntuserver:~/dev/node/bench-46334$ node bench-46334.mjs 
╔══════════════╤═════════╤════════════════╤═══════════╗
║ Slower tests │ Samples │         Result │ Tolerance ║
╟──────────────┼─────────┼────────────────┼───────────╢
║ buffer       │   10000 │ 2090.37 op/sec │  ± 0.14 % ║
╟──────────────┼─────────┼────────────────┼───────────╢
║ Fastest test │ Samples │         Result │ Tolerance ║
╟──────────────┼─────────┼────────────────┼───────────╢
║ decoder      │   10000 │ 2118.40 op/sec │  ± 0.15 % ║
╚══════════════╧═════════╧════════════════╧═══════════╝

1 KiB buffer:

tniessen@ubuntuserver:~/dev/node/bench-46334$ node bench-46334.mjs 
╔══════════════╤═════════╤═══════════════════╤═══════════╗
║ Slower tests │ Samples │            Result │ Tolerance ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ buffer       │  100000 │ 3320487.36 op/sec │  ± 4.30 % ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ Fastest test │ Samples │            Result │ Tolerance ║
╟──────────────┼─────────┼───────────────────┼───────────╢
║ decoder      │  100000 │ 3490502.15 op/sec │  ± 3.77 % ║
╚══════════════╧═════════╧═══════════════════╧═══════════╝

@anonrig
Copy link
Member

anonrig commented Apr 19, 2023

I recommend closing this issue after rereading @tniessen's comment. Please open a new issue if there is a proof that we need to invest more time on this.

@anonrig anonrig closed this as completed Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants