Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from encoding to encoding_rs. #9

Merged
merged 1 commit into from
Mar 24, 2024
Merged

Switch from encoding to encoding_rs. #9

merged 1 commit into from
Mar 24, 2024

Conversation

Ethiraric
Copy link
Owner

See rustsec/advisory-db#1605.

With these changes, we seem to retain the encoding functionalities we had before, aside from the Call variant.
I am not well versed in that side of parsing. Would this require more thorough testing?

cc @mkmik

Fixes #8

@Ethiraric Ethiraric requested a review from davvid March 23, 2024 15:37
@Ethiraric
Copy link
Owner Author

Actually, I could use the *_without_replacement version and handle the Malformed variant.

@Ethiraric
Copy link
Owner Author

This proved to be much more complicated than I anticipated but it should work.

@Ethiraric Ethiraric force-pushed the fix-8 branch 2 times, most recently from 289a091 to d46eb0a Compare March 23, 2024 16:21
Copy link
Collaborator

@davvid davvid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, this looks good to me. I just had some minor sugs but nothing worth holding up a merge over.

// If the output is full, we must reallocate.
(DecoderResult::OutputFull, bytes_read) => {
total_bytes_read += bytes_read;
output.reserve(input.len() / 10);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean towards removing the / 10 here since we're just reserving. It should lead to less reallocs overall. Another common strategy for really large buffers is to double the reserve() size each time we hit OutputFull.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is already reserved to the size of the input. Would it be common to have instances of inputs which, when converted to utf8, are more than 1.1x their size?

This definitely needs a max though. If the input size is less than 10 bytes that's a reserve of 0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's true. Most of the time we'd probably expect something encoded in utf32 to become smaller when converted to utf8, so the / 10 makes sense, especially with a short comment to explain the magic number. / 4 would give a safe buffer for some pathological cases, but in light of this is probably overkill. I wonder whether this code path is ever hit in the wild.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, thanks!

I wonder whether this code path is ever hit in the wild.

I know nearly nothing about encodings, but I'm curious 🤔

src/yaml.rs Outdated
"Invalid character sequence at {byte_idx}: {malformed_sequence:?}",
))));
}
YAMLDecodingTrap::Call(f) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sug: rename f to fun or func.

@Ethiraric Ethiraric merged commit 4f76346 into master Mar 24, 2024
3 checks passed
@Ethiraric Ethiraric deleted the fix-8 branch March 24, 2024 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

switch the 'encoding' dependency to 'encoding_rs'?
2 participants