Wrong LaTeX-Unicode mapping of \varepsilon #14751

hmarthinsen · 2016-01-21T16:13:56Z

\varepsilon is currently mapped to ɛ (U+025B latin small letter open e). This is wrong. The correct mapping is ε (U+03B5 greek small letter epsilon).

The text was updated successfully, but these errors were encountered:

ivarne · 2016-01-22T12:32:24Z

As you can see in /base/latex_symbols.jl this mapping is autogenerated from https://www.w3.org/Math/characters/unicode.xml, and as far as I can tell the script faithfully copies the mapping from the source. There might be a bug in the w3 mappings though.

cc: @stevengj

hmarthinsen · 2016-01-23T15:43:03Z

I think this may be because \varepsilon is assigned to several Unicode characters in https://www.w3.org/Math/characters/unicode.xml, but the wrong Unicode character is selected to represent it in https://github.com/JuliaLang/julia/blob/master/base/latex_symbols.jl. Does this trigger println("# duplicated symbol $L ($id)") on line 29? Maybe an exception should be added as has been done for \perp and \bot on line 32.

This bug affects JunoLab/atom-latex-completions#3

stevengj · 2016-01-26T16:35:03Z

Seems reasonable to add an exception here and update the table. For \epsilon, we are using U+03F5, and that has compatibility decomposition U+0395, reinforcing that the latter is the natural choice for \varepsilon.

Also affects ipython/ipython#6380, as well as other Julia editor plug-ins.

jiahao · 2016-01-26T19:02:14Z

Our choice is not so much wrong as a reflection of historical inconsistencies over the proper mapping of epsilon.
The W3C's XML Entity Definitions even has a special entry documenting the mess that is code point mappings for epsilon.

In this case it looks like we just happen to pick MathML's mappings, which are at variance with other standard definitions for STIX and XML/MathML2.

Digging further:

10/2003 - "SGML Public entity sets for mathematics and sciences" (pdf), p.80: \varepsilon is explicitly mapped to U+025B. Furthermore they note on p. 9 a discrepancy between MathML and Stix consortium's glyph tables:

Entity: [epsi][ISOGRK3]
MathML [U003B5][GREEK SMALL LETTER EPSILON]
Stix [U003F5][GREEK LUNATE EPSILON SYMBOL]
epsilon variants. MathML wrong?
Entity: [epsiv][ISOGRK3]
MathML [U0025B][LATIN SMALL LETTER OPEN E]
Stix [U003B5][GREEK SMALL LETTER EPSILON]
epsilon variants. MathML wrong?

10/2003 W3C MathML2 and ISOGRK3 recommends instead U+03B5 for \varepsilon.
11/2010 - Unicode Technical Note 28 recommends also U+03B5 for \varepsilon.

wsshin · 2016-03-26T19:12:39Z

I definitely agree with other people on making an exception here for a few reasons.

If we compile a LaTeX document containing $\varepsilon$ and check the Unicode of the generated character, it is U+03B5. Considering that most Julia users input Greek letters using LaTeX commands, it is reasonable to expect Julia to produce the same Unicode character as LaTeX for \varepsilon, but currently this is not the case. Julia produces U+025B as other people mentioned.

Personally, I am inputting Greek letters in Julia directly using the Greek keyboard layout instead of using the LaTeX command because that is faster. The "e" key in the Greek keyboard also generates U+03B5 rather than U+025B.

This inconsistency could be a source of errors that are hard to track down, as I just experienced. I was extending someone else's code who used \varepsilon to define the variable ɛ. When extending his code, I used the Greek keyboard key "e" to access this variable. Unfortunately, Julia's \varepsilon and the Greek keyboard's "e" generated different Unicode characters, so I was getting an UndefVarError. Because these two different characters looked the same, it took a while to figure out what was going on.

nalimilan · 2016-03-26T21:00:57Z

This inconsistency could be a source of errors that are hard to catch, as I just experienced. I was extending someone else's code who used \varepsilon as a variable. When extending his code, I used the Greek keyboard to access this variable. Unfortunately, Julia's \varepsilon and the Greek keyboard's "e" generated different unicode characters, so I was getting an UndefVarError. Because these two different characters looked the same in the Juypter notebook, it took a while to figure out what was going on.

This problem will still happen even if we change the mapping (though it will be less frequent). This is the same situation as with mu vs. micro. See #5903.

Godisemo · 2016-11-28T21:36:44Z

+1

StefanKarpinski · 2016-11-28T23:12:36Z

@Godisemo: it's unclear what you're +1 ing here.

Godisemo · 2016-11-28T23:19:26Z

@StefanKarpinski Yeah, I realised that now when you pointed it out. I'm +1 ing the fact that this really is an issue and that I support the proposition to change the \varepsilon expansion from ɛ (https://en.wikipedia.org/wiki/Open-mid_front_unrounded_vowel) to ε (https://en.wikipedia.org/wiki/Epsilon).

Godisemo · 2016-11-28T23:30:05Z

The only thing I see that complicates things are that we have to make the same change in all editor plugins that people use. Personally, I only use vim and atom, so I don't know what other plugins are available for other editors.
https://github.com/JuliaEditorSupport/julia-vim/blob/master/autoload/julia_latex_symbols.vim
https://github.com/JunoLab/atom-latex-completions/blob/master/completions/completions.json

stevengj · 2016-11-29T12:32:25Z

I think the best solution would be to first implement a custom normalization so that ɛ (U+025B latin small letter open e) and ε (U+03B5 greek small letter epsilon) are treated as equivalent in identifiers. Once that is done, we can gradually migrate editor plugins without breaking code. See JuliaStrings/utf8proc#11

(My main concern is that this opens a can of worms, since there are potentially a lot of custom normalizations we might want.)

StefanKarpinski · 2016-11-29T16:37:48Z

I think if we're conservative and take the custom normalizations on a case-by-case basis, it should be ok. The only major danger of each normalization is that someone might be using both letters in a pair that we start to normalize in otherwise indistinguishable ways, breaking code. However, any code that does that is either accidentally broken and would be fixed by the normalization or intentionally obfuscated, which I don't think is a major concern. So the criterion for custom normalization should be at least: would it be crazy to use these two characters in otherwise indistinguishable ways.

Godisemo · 2016-11-29T21:15:10Z

Maybe we could issue a warning if both versions are detected in the same code?

stevengj · 2016-11-29T21:29:45Z

If we go the normalization route, I think we would just have a list of codepoints that we treat as (permanently) equivalent, with no warning. i.e. different ways of inputting "ε" should all be equally valid.

Godisemo · 2016-11-29T21:33:41Z

I don't think treating them the same as a long term plan is a good idea. What if we start doing this for characters that look the same but are totally different, for example Α (capital alpha) and A. What if they look different in in other fonts? I just think we should use the correct characters for the specific latex expansions.

stevengj · 2016-11-29T23:46:58Z

Normalization of confusable characters is pretty well established in Unicode. Python 3 does much more aggressive (NFKC) normalization than us, for example.

Godisemo · 2016-11-29T23:50:40Z

It gets a bit funny though when you do a search and or replace in your source file since no editor i know of treats visually similar characters as equal. Epsilon and varepsilon though are treated equal since they are the same character.

stevengj · 2016-11-30T00:46:06Z

We already do NFC normalization, so probably that bridge has already been crossed. And, as I said, Python 3 already does NFKC normalization and I don't see people complaining

Godisemo · 2016-11-30T21:16:05Z

Yeah, maybe you are right. It would definitely be convenient to treat the visually ambiguous characters as the same. If normalization is a thing then maybe the editors should change instead.

stevengj · 2017-01-06T13:34:19Z

At some point after the 0.6 release, we should push this change to the various editor plugins

ivarne added the unicode Related to unicode characters and encodings label Jan 22, 2016

stevengj mentioned this issue Nov 30, 2016

WIP: custom Unicode normalization for Julia identifiers #19464

Merged

stevengj added a commit to stevengj/julia that referenced this issue Dec 1, 2016

make \varepsilon complete to ε (u+03b5), fixes JuliaLang#14751

7f62ba8

stevengj added a commit to stevengj/julia that referenced this issue Dec 26, 2016

make \varepsilon complete to ε (u+03b5), fixes JuliaLang#14751

ec42578

stevengj added a commit to stevengj/julia that referenced this issue Dec 29, 2016

make \varepsilon complete to ε (u+03b5), fixes JuliaLang#14751

1729cbd

stevengj added a commit to stevengj/julia that referenced this issue Jan 4, 2017

make \varepsilon complete to ε (u+03b5), fixes JuliaLang#14751

8161e59

tkelman closed this as completed in 62c423b Jan 6, 2017

mapio mentioned this issue Oct 15, 2018

Questionable choice for the greek codepoints in IPython.core.latex_symbols ipython/ipython#11399

Closed

cynddl mentioned this issue Jun 17, 2019

BUG: no attribute when the column name contains an "ϵ" pandas-dev/pandas#26885

Closed

knuesel mentioned this issue Jan 8, 2021

Fix LaTeX completions of Greek variants, add \frakI and \frakR #39148

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong LaTeX-Unicode mapping of \varepsilon #14751

Wrong LaTeX-Unicode mapping of \varepsilon #14751

hmarthinsen commented Jan 21, 2016

ivarne commented Jan 22, 2016

hmarthinsen commented Jan 23, 2016

stevengj commented Jan 26, 2016

jiahao commented Jan 26, 2016

wsshin commented Mar 26, 2016 •

edited

Loading

nalimilan commented Mar 26, 2016

Godisemo commented Nov 28, 2016

StefanKarpinski commented Nov 28, 2016

Godisemo commented Nov 28, 2016

Godisemo commented Nov 28, 2016

stevengj commented Nov 29, 2016

StefanKarpinski commented Nov 29, 2016

Godisemo commented Nov 29, 2016

stevengj commented Nov 29, 2016

Godisemo commented Nov 29, 2016

stevengj commented Nov 29, 2016

Godisemo commented Nov 29, 2016 •

edited

Loading

stevengj commented Nov 30, 2016

Godisemo commented Nov 30, 2016

stevengj commented Jan 6, 2017

Wrong LaTeX-Unicode mapping of \varepsilon #14751

Wrong LaTeX-Unicode mapping of \varepsilon #14751

Comments

hmarthinsen commented Jan 21, 2016

ivarne commented Jan 22, 2016

hmarthinsen commented Jan 23, 2016

stevengj commented Jan 26, 2016

jiahao commented Jan 26, 2016

wsshin commented Mar 26, 2016 • edited Loading

nalimilan commented Mar 26, 2016

Godisemo commented Nov 28, 2016

StefanKarpinski commented Nov 28, 2016

Godisemo commented Nov 28, 2016

Godisemo commented Nov 28, 2016

stevengj commented Nov 29, 2016

StefanKarpinski commented Nov 29, 2016

Godisemo commented Nov 29, 2016

stevengj commented Nov 29, 2016

Godisemo commented Nov 29, 2016

stevengj commented Nov 29, 2016

Godisemo commented Nov 29, 2016 • edited Loading

stevengj commented Nov 30, 2016

Godisemo commented Nov 30, 2016

stevengj commented Jan 6, 2017

wsshin commented Mar 26, 2016 •

edited

Loading

Godisemo commented Nov 29, 2016 •

edited

Loading