Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode control characters trigger debug_assert_eq #16

Closed
CommanderStorm opened this issue Jun 23, 2024 · 3 comments · Fixed by #15
Closed

unicode control characters trigger debug_assert_eq #16

CommanderStorm opened this issue Jun 23, 2024 · 3 comments · Fixed by #15

Comments

@CommanderStorm
Copy link
Contributor

Hello,

Thanks for writing the truncation code ^^

During testing of this library, I noticed that carefully crafted inputs containing control caracters can trigger an assertion. This only happens in in debug mode => not a runtime panic for release. The used character is End of Medium.

The assertion is located here

debug_assert_eq!(result.width(), new_width);

A minimal (failing) testcase is the following

#[cfg(test)]
mod tests {
  #[test]
  fn zero_width_control_char() {
      std::assert_eq!("\u{0019}".unicode_truncate(2), ("\u{0019}", 0));
  }
}

This happens because at said point (expanded for better readability)

  • let (byte_index, new_width) = (1, 0);
    
  • let result = "\u{0019}".get(..1).unwrap();
    
  • let result = "\u{0019}";
    
  • => thus the result.width() (=1) is larger than the new_width (=0).

This case happens as (self.len(), 0) is appended to the interator.
This makes the iterator (as intended by the doccomment) eager for trailing zero width characters.
=> The asserted invariant does not hold anymore.

A possible fix would be to change the assertion to

- debug_assert_eq!(result.width(), new_width);
+ debug_assert!(result.width() >= new_width);

I can provide a PR with this fix if you'd prefer this

@CommanderStorm
Copy link
Contributor Author

(this was likely introduced in unicode-width=0.1.13 via unicode-rs/unicode-width#45)

@Aetf
Copy link
Owner

Aetf commented Jun 24, 2024

This is a legit logic error caught by debug_assert_eq.

The problem is that we control char to be of width 0 here:

.map(|(byte_index, char)| (byte_index, char.width().unwrap_or(0)))

But unicode-width treats control char as width 1 via unicode-rs/unicode-width#45

Aetf added a commit that referenced this issue Jun 24, 2024
This is consistent with how unicode-width handles string width vs char
width.

See also unicode-rs/unicode-width#45
Aetf added a commit that referenced this issue Jun 24, 2024
This is consistent with how unicode-width handles string width vs char
width.

See also unicode-rs/unicode-width#45
@Aetf Aetf closed this as completed in 66e84b3 Jun 24, 2024
@CommanderStorm
Copy link
Contributor Author

Thanks a ton ♥️

Aetf pushed a commit that referenced this issue Jul 9, 2024
## 🤖 New release
* `unicode-truncate`: 1.0.0 -> 1.1.0

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

##
[1.1.0](v1.0.0...v1.1.0)
- 2024-07-08

### Added
- segment by graphemes
([#11](#11))

### Fixed
- *(deps)* update rust crate itertools to 0.13
([#20](#20))
- fixed typos in the `renovate.json`
([#17](#17))

### Other
- Removed unnessary debug-assertions setting
- Treat control characters as width 1, fixes
[#16](#16)
([#19](#19))
- tweak renovate configs
([#13](#13))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/MarcoIeni/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants