Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) #267

Open
dscrofts opened this issue Mar 20, 2024 · 4 comments

Comments

@dscrofts
Copy link

Example:

from blessed import Terminal

term = Terminal()
strings = ["123", "456", "🗣️  "]

print("with term.ljust:")
for string in strings:
    print(f"{term.ljust(string, 5)} 1")

print("without term.ljust:")
for string in strings:
    print(f"{string:<5} 1")

Output (term.ljust adds one additional cell):

with term.ljust:
123   1
456   1
🗣️     1
without term.ljust:
123   1
456   1
🗣️    1

However this is not consistent with all unicode sequences. For example, changing strings to ["123", "456", "🤔 "] gives:

Output (term.ljust padding is correct):

with term.ljust:
123   1
456   1
🤔    1
without term.ljust:
123   1
456   1
🤔     1
@jquast jquast changed the title term.ljust calculating incorrect padding value with some unicode sequences term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) Mar 20, 2024
@jquast
Copy link
Owner

jquast commented Mar 20, 2024

Hello, thanks for the report.

I was aware of this issue but there was no bug to track it. I could probably add a simple workaround here in blessed so I will try to do that soon.

I recently added support for Variation Selector-16 (U+FE0F) into wcwidth. But the way that blessed uses this library still gets the calculation wrong (adding each individual codepoint together from wcwidth.wcwidth() function).

I might,

  • add the functionality of interpreting terminal sequences directly into wcwidth library which blessed will directly offload to Should wcwidth provide rjust, ljust, center and textwrap? wcwidth#93
  • or a "grapheme clustering" functionality to wcwidth that blessed should use
  • or just make blessed do the "grapheme clustering" necessary to account for these correctly

Correct accounting for Emoji that includes U+FE0F is difficult, only 7 terminals support it at last check, i wrote more about it here https://www.jeffquast.com/post/ucs-detect-test-results/, and I've gotten pushback from libvte author used in terminals like Gnome, they refuse to support it at all https://gitlab.gnome.org/GNOME/vte/-/issues/2580 so i've been a bit distracted just trying to get terminal emulators to support it, rather than having blessed support it, but I will definitely get to it soon.

@jquast
Copy link
Owner

jquast commented Mar 20, 2024

Also to add, I could tell this included U+FE0F by the following commands,

>>> import unicodedata
>>> list(map(unicodedata.name, '🗣️  '))
['SPEAKING HEAD IN SILHOUETTE', 'VARIATION SELECTOR-16', 'SPACE', 'SPACE']
>>> list(map(hex, map(ord, '🗣️  ')))
['0x1f5e3', '0xfe0f', '0x20', '0x20']

@jquast
Copy link
Owner

jquast commented Mar 20, 2024

Also to add, that python's built-in formatting gets this horribly wrong, it's not aware of emojis, terminal sequences, or even basic east-asian characters like Chinese or Japanese, but in your case it just happens to accidentally get it right :)

I wrote an issue about what it might take to get python's built-in formatting to just account for emoji correctly, jquast/wcwidth#94

@jquast
Copy link
Owner

jquast commented Jun 26, 2024

Just to add, I added some tests in #275 around ZWJ, pointing out that it gets it wrong. I will continue to work towards a solution for this, I think the wcwidth library needs a kind of iterative parser to correctly solve this in a way that can be integrated into blessed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants