Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(ext/web): Don't rely on NaN comparisons in TextEncoderStream #13151

Merged
merged 1 commit into from
Jan 4, 2022

Conversation

andreubotella
Copy link
Contributor

In the transform function to TextEncoderStream's internal TransformStream, if chunk is the empty string and this.#pendingHighSurrogate is null, then lastCodeUnit will be NaN. As it turns out, this does not cause a bug because the comparison to check for lone surrogates turns out to be false for NaN, but to rely on it makes the code brittle.

In the `transform` function to `TextEncoderStream`'s internal
`TransformStream`, if `chunk` is the empty string and
`this.#pendingHighSurrogate` is null, then `lastCodeUnit` will be NaN.
As it turns out, this does not cause a bug because the comparison to
check for lone surrogates turns out to be false for NaN, but to rely on
it makes the code brittle.
Copy link
Member

@bartlomieju bartlomieju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that could be used to check this change?

@andreubotella
Copy link
Contributor Author

The NaN comparison wasn't actually causing any difference in behavior from what would be expected, since the comparison to check whether chunk ends with a lead surrogate (0xD800 <= lastCodeUnit && lastCodeUnit <= 0xDBFF) turns out to be false if lastCodeUnit is NaN, which is what you'd expect if chunk is empty. As the commit message says, this refactoring removes the NaN comparison to make the code less brittle.

Returning early if chunk is the empty string but this.#pendingHighSurrogate is not null is also not a behavior difference, since this.#pendingHighSurrogate is either null or a string containing a single lead surrogate code unit, so that code unit would be prepended to chunk and then removed and stored into this.#pendingHighSurrogate again.

Copy link
Member

@bartlomieju bartlomieju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andreu

@bartlomieju bartlomieju merged commit 22d140f into denoland:main Jan 4, 2022
@andreubotella andreubotella deleted the textencoderstream-nan branch January 4, 2022 23:42
bartlomieju pushed a commit that referenced this pull request Jan 5, 2022
…m` (#13151)

In the `transform` function to `TextEncoderStream`'s internal
`TransformStream`, if `chunk` is the empty string and
`this.#pendingHighSurrogate` is null, then `lastCodeUnit` will be NaN.
As it turns out, this does not cause a bug because the comparison to
check for lone surrogates turns out to be false for NaN, but to rely on
it makes the code brittle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants