You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I'm not really sure what's going on here, if this is expected for a reason I don't understand or possibly indicative of a larger problem, but:
ß doesn't appear to have titlecase or uppercase mappings, although it should (IIUC) map to Ss and SS respectively. It does casefold to ss as expected.
(NB: It doesn't appear to matter what locale I provide; I tried localize(:de) and localize(:zh) with identical results. And similar mappings aren't always missing, e.g. "dž" is handled fine.)
Expected behavior
I expected "ß".localize.upcase.to_s to produce SS (as Ruby's built-in upcase method does) and "ß".localize.titlecase.to_s to produce Ss.
Hey @djudd, thanks for bringing this to my attention. I'm also a bit surprised by the current behavior.
After an investigation, I've uncovered several interesting things of note:
There is no simple uppercase or titlecase mapping for these characters in the Unicode Character Database, which is what TwitterCLDR uses to apply case mappings. Apparently the mappings in the UCD are only 1:1 in order to maintain backwards compatibility with the large number of parsers that were written to expect only single-character replacements.
There is another file in the UCD called SpecialCasing.txt that, until now, I did not know existed. This is where the uppercase mapping from "ß" to "SS" and the titlecase mapping from "ß" to "Ss" come from (as well as a number of other mappings that are either locale-specific or that require additional context to function).
It used to be that a capital Eszett didn't exist. German had 30 lowercase letters and 29 uppercase ones. However, German added a capital Eszett (\u1E9E) to their alphabet in 2017 after a century of debate.
So what's the right thing to do here? The latest version of Unicode only provides a mapping from capital Eszett -> lowercase Eszett, and even SpecialCasing.txt only maps to "SS" and "Ss" with no mention of the capital Eszett. The Unicode standard section 3.13 says only that:
Examples of case tailorings which are not covered by data in SpecialCasing.txt include ... Uppercasing of U+00DF “ß” latin small letter sharp s to U+1E9E latin capital letter sharp s.
So... thanks Unicode? I haven't been able to find any other casing data in Unicode or CLDR. It appears the "correct" thing to do is to map to "SS" and "Ss," even though I have to think lower to uppercase Eszett is probably more correct. Maybe TwitterCLDR could do that specifically for German from Germany, since Swiss German and other dialects don't use the Eszett at all.
In any case, I'll implement the rules in SpecialCasing.txt.
Describe the bug
I'm not really sure what's going on here, if this is expected for a reason I don't understand or possibly indicative of a larger problem, but:
ß
doesn't appear to have titlecase or uppercase mappings, although it should (IIUC) map toSs
andSS
respectively. It does casefold toss
as expected.To Reproduce
Steps to reproduce the behavior:
(NB: It doesn't appear to matter what locale I provide; I tried
localize(:de)
andlocalize(:zh)
with identical results. And similar mappings aren't always missing, e.g. "dž" is handled fine.)Expected behavior
I expected
"ß".localize.upcase.to_s
to produceSS
(as Ruby's built-inupcase
method does) and"ß".localize.titlecase.to_s
to produceSs
.Screenshots
n/a
Environment
Additional context
none
The text was updated successfully, but these errors were encountered: