Get proper width reporting for utf8 Thai script #18650

pepa65 · 2023-02-15T04:12:18Z

pepa65
Feb 15, 2023

Describe the feature

The .len_utf8() function on strings gives the right number of utf-8 glyphs, but when tone or other marks or certain vowels are used that are displayed under or over a consonant, there is no way in V to find out the displayed length. There is no accounting for zero-width characters.

The .utf8_str_visible_length() function gives the same result as the .len_utf8 function, and the .east_asian.display_width() function (in encoding.utf8.east_asian) also returns the same (incorrect) result.

For example, the word ผู้ is composed of 3 utf8 glyphs, but is displayed in 1 character position.

Use Case

When calculating the displayed length of words or phrases in the Thai language.

Proposed Solution

Account for zero-width iso glyphs.

Other Information

No response

Acknowledgements

I may be able to implement this feature request
This feature might incur a breaking change

Version used

V 0.3.3 c16549b

Environment details (OS name and version, etc.)

Linux Mint 21.1

pepa65 · 2023-09-20T04:49:28Z

pepa65
Sep 20, 2023
Author

This program is another example of not taking into account zero-width characters:

names := ['1234567890', 'ตาล', 'René', 'ยอห์น', 'ปีเตอร์']
for name in names {
  println('ชื่อ: ${name:10s}')
}

Output:

ชื่อ: 1234567890
ชื่อ:        ตาล
ชื่อ:       René
ชื่อ:      ยอห์น
ชื่อ:    ปีเตอร์

This is V 0.4.1 fdabd27 now. It should be considered a bug, and not a discussion, there is nothing really to discuss I think. Taking into account zero-width characters is not particularly hard, many languages and little programs have done it.

Note that the é is also 2 runes, but somehow 0xcc81 handled correctly.

0 replies

peppergrayxyz · 2024-08-26T18:39:41Z

peppergrayxyz
Aug 26, 2024

V doesn't handle grapheme clusters. I create a feature request for this: #22117

0 replies

medvednikov · 2024-08-26T19:45:46Z

medvednikov
Aug 26, 2024
Maintainer

Thanks for reporting.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get proper width reporting for utf8 Thai script #18650

{{title}}

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Get proper width reporting for utf8 Thai script #18650

pepa65 Feb 15, 2023

Describe the feature

Use Case

Proposed Solution

Other Information

Acknowledgements

Version used

Environment details (OS name and version, etc.)

Replies: 3 comments

pepa65 Sep 20, 2023 Author

peppergrayxyz Aug 26, 2024

medvednikov Aug 26, 2024 Maintainer

pepa65
Feb 15, 2023

pepa65
Sep 20, 2023
Author

peppergrayxyz
Aug 26, 2024

medvednikov
Aug 26, 2024
Maintainer