-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Win95 println!("Hello, 世界!") panics when using Chinese locale #13
Comments
BTW, SDK:
|
Thanks for testing these!
Yeah, on WinXP there is no unicows fallbacks being loaded adn the commands go straight through to WriteConsoleW just like on modern Windows, so it seems more like an issue of the old cmd.exe not fully running in unicode mode or something similar. EDIT: Ah, might actually be fixable!
In fact, if you enable the (recently fully deprecated) legacy console in modern Windows versions:
I'm more surprised that it actually manages to output those characters, at least if your locale/windows language doesn't include them! Interesting, I'll try it on my Win98 system and see if I can figure out where the panic comes from. |
Just tested fn main() {
println!("Hello, 世界!");
} on my Win98 machine, it doesn't crash, but also just writes "Hello, ??!" to the console as expected (since the codepage doesn't have those characters). What version and language of Win95 did you use? Regarding the panic - it seems like EDIT: checking in rust playground, |
That's Win95 Chinese Edition, and system language is Chinese. I have two ideas:
Considering that 9x and NT have different levels of unicode support, it may be necessary to treat them separately. Without knowing the underlying details, I've done tests before that show that: |
Off topic: |
that's exactly what the rust stdlib does :) it converts from utf8 to utf16 and calls
...
that's exactly what
yes, definitely. lots of programs and games were specifically made for a region. there are lots of games that only work correctly with a Japanese locale, for example. Codepages are byte-based, so they had to hack in support for multibyte characters (since obviously there are more than 256 Chinese characters): MBCS seems to work like a primitive, language/region-specific version of UTF8. The first half of the first byte stays ASCII (0x00-0x7F) and the second half can be an "MBCS lead byte", meaning that the next byte is part of the same character. The problem with Rust checks the number of chars written to know how much was actually written, but since it only consisted of 11 utf-16 So yeah, in the end,
this is needed. I think doing the conversion on the stdlib side makes sense, so we know how many bytes we expect to write out. Thankfully console I/O is probably the only area where this is needed. |
The UTF8 console implementation has been broken and not recommended until very recently (some Windows 10 release I think?). Either way, it won't help with the font rendering issue on Windows XP's cmd.exe, so there is no reason to change it from Rust mainline. |
For 9x/ME: So there's a right way, and a hacky way: The right way (roughly what unicows does):
However, if the number of bytes written don't match, you'd have to scan through the string to figure out how many MBCS characters, not bytes have been written, to report the correct usize for the length of written utf8 chars. The hacky way:
This will actually likely work, as the buffer hopefully isn't smaller than 8K on any Windows version, and thus should always be able to write the entire buffer. I think I'll go with this one and create an improvement issue if someone wants to implement the proper way. |
Oh, it always happens when the number of characters in utf16 doesn't match the number of characters in the output. This can easily happen with emojis as well. |
On my pc, it takes about an hour to compile rust9x(stage2) manually. Right now I'm not home to do the test. |
@seritools Thank you. You are very warm and friendly. 😊 |
- Allow dropping unknown characters. unicows just doesn't understand emojis :( - Ignore mismatched lengths when writing to console on non-Unicode Windows. (workaround for #13)
- Allow dropping unknown characters. unicows just doesn't understand emojis :( - Ignore mismatched lengths when writing to console on non-Unicode Windows. (workaround for #13)
- Allow dropping unknown characters. unicows just doesn't understand emojis :( - Ignore mismatched lengths when writing to console on non-Unicode Windows. (workaround for #13)
On win95, as long as unicows.dll is included, it will output unicode characters, but the program will end up in panic.
On WinXP, the same program will not panic.
But XP has other unicode problems.
If the "non-unicode profile" is English, then the unicode character becomes "??", which does not automatically fallback to the corresponding font. And I think this may be a problem with the WinXP cmd itself.
bad:
good:
good:
The text was updated successfully, but these errors were encountered: