-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional UTF-8 Display/File character support. #73
Comments
I sometimes work with Chinese and Japanese text, and if this could get working, I'd be extremely happy. Right now not even Swedish characters like åäö work for me in rxvt-unicode or XTerm, when I try to enter them. When I import a csv containing them (in UTF8) they predictably get stripped out. |
Yeah, not great right now, I can't even use £ lol. I looked at the code a bit today, I think I can make a few improvements easily, some might be harder though! Internally, lotus uses LMBCS, which is actually pretty impressive foresight considering unicode wasn't invented and everyone else was using codepages. This is good, because internally it can tell the difference between åäa. You can see it knows about å, and calls it a ring: It stores these characters correctly but doesn't know how to display them, so right now it uses a "fallback" ascii character translation table (å => a and £ => L, and so on). That actually seems pretty easy to solve, I'll just add a lmbcs => utf-8 table, then pass it to waddch() instead. I'll give it a shot this weekend. |
I think display and keyboard input might be easy, but the question is what to do with |
An environment variable would of course be great for legacy files. I would default to UTF-8, since that is standard in Linux today. It's a lot of work to set a normal distro to use anything else. But there are a lot of legacy files out there, and many systems which still spit out very strange formats. Don't ask me how I know. |
Okay, I think I've got a plan. I have an easy temporary improvement, and a plan for a harder complete solution. I can change the keymap code to translate UTF-8 on input to all the supported lmbcs characters. There are no collisions (I checked) so this will be super easy, I can do this in a day or two. This is easy but not a complete solution -- there's no cjk for a start... but it is better than nothing - most of the latin extended characters are covered (so I'll get £, you'll get all the Swedish characters, things like éßçñ are all there). There is no €, but it has ¤, it seems pretty safe to just steal that for € for now? I don't know. The complete solution will be adding lmbcs<->UTF-8 charset support, but this is a much bigger job. |
This is the first step in improving i18n support. If any UTF-8 sequences have LMBCS encodings, translate them on input. These characters are stored as LMBCS internally, and you can differentiate them with @code, but they are not displayed correctly (they are transliterated to ASCII, see the 1-2-3R3.1 manual, Appendix 2). The next part of this change will be displaying them as UTF-8.
@taviso If any help can be given, I'm willing; especially regarding Greek. It may be a waste of time to get me programmatically involved, but it'll be easier regarding conversion tables I suppose. |
Thank you! I'm slowly working on this, it will work eventually! 😆 |
Lotus 1-2-3 predates UTF-8, and uses LMBCS internally, which is sort of a precursor to unicode.
I see no reason we couldn't add a UTF-8 option for file/display charset, for better i18n support. It supports character set translation, we just have to teach it how and figure out the CBD (character bundle) format. I already know the BDLREC format, from my lotusdrv project - it's basically a TLV (tag, length, value) encoding system.
The text was updated successfully, but these errors were encountered: