Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: write word to Program dictionary's stdin in UTF-8 instead of local 8 bit #1743

Merged

Conversation

shenlebantongying
Copy link
Collaborator

@shenlebantongying shenlebantongying commented Sep 3, 2024

One line of change. No impact for Unix, only impact today's Windows in rare situations.

The root problem is that Python since 3.6 assume stdin on Windows is UTF-8 since 2016 12.

It is impossible for normal user to figure out this issue and unlikely to find out what's his local code page and how to deal with it.

A programming language has to take extra care to make needed Windows API available, but the important languages simply don't care, including python1, rust3, go, java17+4….

On high level, both Unix's locale dependent and Windows's code pages are 💩💩💩💩💩 that sane programmers generally avoid. In fact, Windows 11 default the code page to utf-8 5.

GD's original code assumes programs on Windows will use windows' code page 💩 to process data, but that's not true nowadays.

Since Python assumes stdin is UTF-8, I don't see why we shouldn't write stdin in UTF-8. This eliminates the rare Unicode error on Windows for Python.

In case of any encoding error on Windows for program dictionary, user can now deterministically and obviously know that what the root issue is and the direction of fixing it.

Footnotes

  1. https://peps.python.org/pep-0528/ 2

  2. https://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING

  3. https://doc.rust-lang.org/std/io/fn.stdin.html#note-windows-portability-considerations

  4. https://docs.oracle.com/en/java/javase/21/intl/supported-encodings.html#GUID-A17E6FED-5880-4836-8E62-18007BD58E85

  5. https://stackoverflow.com/questions/70201846/windows-11-default-api-and-utf-encoding

Copy link

sonarcloud bot commented Sep 3, 2024

@xiaoyifang
Copy link
Owner

Since Python assumes stdin is UTF-8, I don't see why we shouldn't write stdin in UTF-8. This eliminates the rare Unicode error on Windows for Python.

+1

@xiaoyifang xiaoyifang merged commit c599bbf into xiaoyifang:staged Sep 4, 2024
8 checks passed
@shenlebantongying shenlebantongying deleted the feat/program-stdin-utf8 branch September 4, 2024 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants