Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode variation sequences #2244

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

xworld21
Copy link
Contributor

Fix #2235, at least when it comes to empty set and (capital) script characters. Unicode specifies a few other sequences but I haven't found which macros or fonts should have that output (see https://www.unicode.org/Public/15.1.0/ucd/StandardizedVariants.txt).

This change is supposed to be fully backwards compatible: if the MathML font recognises the variation sequence, it will use the more correct character; if not, the standard dictates that it must behave as if the variation selector is not there. I have tried to ensure that --noplane1 and --hackplane1 keep working correctly.

Note: the output of tools/compilemetrics is not deterministic so I just inspected the the output and applied the (trivial) change to StandardMetrics.pm by hand. I don't know how to test if the metrics work correctly.

@xworld21
Copy link
Contributor Author

xworld21 commented Nov 1, 2023

Oh, I should have added a demo:

@xworld21 xworld21 force-pushed the unicode-variation-sequences branch from b51b78a to 5486852 Compare May 18, 2024 12:57
@xworld21
Copy link
Contributor Author

I have rebased against the new refactor. However, the OMS font metrics do not have \x{2205}\x{FE00}. If I understand correctly, that's because the TeX font maps themselves do not use UVSs. Whether this is a problem depends on how StandardMetrics.pm is used.

@xworld21
Copy link
Contributor Author

Whether this is a problem depends on how StandardMetrics.pm is used.

Oh, probably not. In my overly rushed tests, split(//,$string) splits the variation sequence \x{FE00} off, so computeStringsSize should keep working as intended.

@brucemiller
Copy link
Owner

I'm very impressed how well you navigated through LaTeXML's Unicode maze! It looks spot-on, as far as it goes; The "big picture" still has a few mysteries to be sorted out, however.

  • You raise a good point regarding the metrics. Apparently the variant selector is not considered to be part of the glyph, but for our purposes, perhaps, some of the time it should be? Defer...
  • It is interesting that the empty set case "works", but the script/caligraphic do not, yet. I don't understand the machinery behind the scenes; which parts are handled by the font and browser respectively. Fred's stylesheet suggests that we still need appropriate classes on the tokens to trigger the change, but demonstrates fancier CSS techniques to effect it, than the font-family approach in LaTeXML.css.
  • That latter rule has perhaps dubious family choices anyway: Zapf or URW Chancery sounds right for Caligraphic, but whichever one the browser gave me actually looks more roundhand?!?
  • I'm curious if possibly Fred has the wrong magical code "`1" instead of "2" for caligraphic?
  • According to Wiki dictionary, my "caligraphic" is an archaic form of "calligraphic". Hmm, don't I feel old, now?
  • And perhaps the Unicode remapping should also add variation seq to lowercase? The recent font patches should, in the normal cases like \mathcal, avoid giving caligraphic style to lowercase; some other package or font might legitimately provide or expect them (eventually).

@xworld21
Copy link
Contributor Author

I have some answers!

  • It is interesting that the empty set case "works", but the script/caligraphic do not, yet. I don't understand the machinery behind the scenes; which parts are handled by the font and browser respectively. Fred's stylesheet suggests that we still need appropriate classes on the tokens to trigger the change, but demonstrates fancier CSS techniques to effect it, than the font-family approach in LaTeXML.css.
  • I'm curious if possibly Fred has the wrong magical code "`1" instead of "2" for caligraphic?

The crux of the matter is that the calligraphic distinction used to be treated as a stylistic choice, so the few fonts that offer options use different, non-standardised strategies. For instance, STIX uses stylistic sets, which require STIX-specific CSS rules to trigger the glyph switch. Since the rules are font specific, you need to use the latest @font-feature-values to write CSS that works with font stacks (which I didn't think about until now! expect another PR soon to improve that font-family situation).

Variation sequences are instead semantic and meant to work the same way regardless of the font, and are very poorly supported at the moment. However, at least STIX (stipub/stixfonts#218) and MathJax (mathjax/MathJax#3045) claim they will understand them in the future. Cambria Math should already work with them (not tested!).

  • And perhaps the Unicode remapping should also add variation seq to lowercase? The recent font patches should, in the normal cases like \mathcal, avoid giving caligraphic style to lowercase; some other package or font might legitimately provide or expect them (eventually).

The lowercase ones are not specified in Unicode (presumably because they don't appear in the wild), so that seems risky, should a future standard give a different meaning to them (extremely unlikely of course).

@brucemiller
Copy link
Owner

Thanks @xworld21 ; The fact that empty set works implies that the Browser & Font are capable of dealing with variation sequences, but are not using it for caligraphic/roundhand. Thanks for the pointer: it looks like STIX 2.2 will support it.

I'm a little leery of belt-n-suspenders adding both the variation selectors and classes. You're right that Fred's CSS rules are specific to Stix2, which makes it tempting to add a STIX-Math.css file to LaTeXML, but which then probably needs to bundle a copy of STIX-math or refer to a CDN. I'd hope to avoid that, but if the need for CSS evolves, maybe that makes sense? (eg. bundle STIX 2.1 with CSS rules; eventually upgrade to STIX 2.2 w/o CSS).

You're also right about the lowercase; I thought I'd seen that they were included. I suppose we'd still have to use whatever CSS, if it ever comes to that (doubtful).

@xworld21
Copy link
Contributor Author

makes it tempting to add a STIX-Math.css file to LaTeXML, but which then probably needs to bundle a copy of STIX-math or refer to a CDN. I'd hope to avoid that, but if the need for CSS evolves, maybe that makes sense?

I have something more flexible in mind. With modern CSS, you can set font features depending on the actual font being used, so it would be enough to add a rule for 'STIX Two Math' and maybe a few other known fonts. If the browser is using STIX, good, otherwise nothing happens. Not entirely bullet proof (e.g. the font could have been imported with a different name), but it sounds somewhat better than the current font-family. I have to run a few experiments before making a proper suggestion, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add Unicode variation selectors (mainly for calligraphic vs script style)
2 participants