Skip to content
This repository has been archived by the owner on Aug 12, 2018. It is now read-only.

Bring back the ability to edit files using the system default code page #193

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jberezanski
Copy link

Many non-Unicode text files on non-English systems are encoded in the
system-default code page. Users expect to be able to edit such files.

However, in version 3.6.7, Scintilla developers decided to break this
scenario by equating the default (=unspecified) code page with code page
1252 (Western European). This causes Scintilla to mistreat international
characters typed by the user - they either show as non-accented latin
letters or as question marks. The only way to avoid this behavior in
Notepad2-mod is to set the file encoding manually.

Internally, Notepad2-mod attempts to do the right thing. The encoding
described in the UI as "ANSI" is internally mapped to CPI_DEFAULT and
Notepad2-mod treats it as using the system default code page, as
evidenced by code which adds to the description of this encoding the
output of the GetACP() Win32 function
(Edit.c, function Encoding_InitDefaults()).
So, for example, on a Polish system the ANSI encoding option (in the
encoding selection dialogs) is shown as "ANSI (1250)". Due to the change
in Scintilla, however, this is no longer accurate - Scintilla will not
use code page 1250 (the default code page on that system), but the
hardcoded 1252.

In Scintilla change history, the change in 3.6.7 is described as
"[preventing] unexpected behavior and crashes on East Asian systems".
It is the opinion of this developer that using the system default code
page by default is, in fact, the expected behavior from the user point
of view (and Notepad2-mod is perfectly capable of handling multi-byte
encodings correctly), so the reasoning for the change is invalid and the
change should be reverted. Which this commit does.

(For comparison, the other popular Scintilla-based editor, Notepad++,
currently uses an older Scintilla version (3.5.6), so it did not
encounter this issue yet.)

Fixes #173.

Many non-Unicode text files on non-English systems are encoded in the
system-default code page. Users expect to be able to edit such files.

However, in version 3.6.7, Scintilla developers decided to break this
scenario by equating the default (=unspecified) code page with code page
1252 (Western European). This causes Scintilla to mistreat international
characters typed by the user - they either show as non-accented latin
letters or as question marks. The only way to avoid this behavior in
Notepad2-mod is to set the file encoding manually.

Internally, Notepad2-mod attempts to do the right thing. The encoding
described in the UI as "ANSI" is internally mapped to CPI_DEFAULT and
Notepad2-mod treats it as using the system default code page, as
evidenced by code which adds to the description of this encoding the
output of the GetACP() Win32 function
(Edit.c, function Encoding_InitDefaults()).
So, for example, on a Polish system the ANSI encoding option (in the
encoding selection dialogs) is shown as "ANSI (1250)". Due to the change
in Scintilla, however, this is no longer accurate - Scintilla will not
use code page 1250 (the default code page on that system), but the
hardcoded 1252.

In Scintilla change history, the change in 3.6.7 is described as
"[preventing] unexpected behavior and crashes on East Asian systems".
It is the opinion of this developer that using the system default code
page by default is, in fact, the *expected* behavior from the user point
of view (and Notepad2-mod is perfectly capable of handling multi-byte
encodings correctly), so the reasoning for the change is invalid and the
change should be reverted. Which this commit does.

(For comparison, the other popular Scintilla-based editor, Notepad++,
currently uses an older Scintilla version (3.5.6), so it did not
encounter this issue yet.)

Fixes XhmikosR#173.
RaiKoHoff added a commit to RaiKoHoff/Notepad3 that referenced this pull request Aug 5, 2017
- consistent encoding <> code-page handling (including Scintilla's code-page settings)
- Scintilla issue regarding notepad2-mod issie rizonesoft#173 (see XhmikosR/notepad2-mod#193)
- allow arbitrary conversion between encodings (even it it does not make sense in any case)
  (instead of doing silently nothing but changing encoding info on status bar)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant