[text-format] Fix parsing of string literals #730

cmyr · 2024-06-17T17:40:48Z

This renames next_byte_value to next_str_lit_bytes and changes the signature so that it returns between 1..=4 bytes per call, representing the variable-length nature of the UTF-8 encoding.

(hopefully) fixes text_format parsing does not correctly handle non-ascii chars? #729

note: I'm not sure how best to add tests for this, and it needs it; in particular there should be a test case of a text-format input that contains a non-ascii string literal. There should probably also be more tests for the weird byte escapes? But definitely a case with non-ascii text.

stepancheg · 2024-06-26T01:26:30Z

Can you please add some test that would fail without this PR?

stepancheg · 2024-06-26T01:27:54Z

protobuf-support/src/lexer/lexer_impl.rs

+/// The raw bytes for a single char or escape sequence in a string literal
+///
+/// The raw bytes are available via an `into_iter` implementation.
+pub struct DecodedBytes {


This seems to be not used outside of the crate, so it should not be public.

It's the return type of a public method, so it needs to be pub. We could modify that signature to return impl Iterator, if that is preferable?

This renames `next_byte_value` to `next_str_lit_bytes` and may return between 1..=4 bytes per call, representing the variable-length nature of the UTF-8 encoding.

cmyr · 2024-06-26T16:54:23Z

I've added a test case that fails without this patch but passes with it.

stepancheg requested changes Jun 26, 2024

View reviewed changes

[text-format] Fix parsing of string literals

59d6e61

This renames `next_byte_value` to `next_str_lit_bytes` and may return between 1..=4 bytes per call, representing the variable-length nature of the UTF-8 encoding.

cmyr force-pushed the parse-unicode-strings branch from 0eaddf2 to 59d6e61 Compare June 26, 2024 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[text-format] Fix parsing of string literals #730

[text-format] Fix parsing of string literals #730

cmyr commented Jun 17, 2024

stepancheg commented Jun 26, 2024

stepancheg Jun 26, 2024

cmyr Jun 26, 2024

cmyr commented Jun 26, 2024

[text-format] Fix parsing of string literals #730

Are you sure you want to change the base?

[text-format] Fix parsing of string literals #730

Conversation

cmyr commented Jun 17, 2024

stepancheg commented Jun 26, 2024

stepancheg Jun 26, 2024

Choose a reason for hiding this comment

cmyr Jun 26, 2024

Choose a reason for hiding this comment

cmyr commented Jun 26, 2024