Accents #2404

brucemiller · 2024-08-21T01:17:07Z

This PR is another round in improving the fidelity of lower-level TeX and plain.tex. It deals with accents. Accents in TeX fonts basically show an accented space which gets overlaid on top of a character to be accented (potentially with some positioning tweaks). To deal with Unicode and MathML, we need to distinguish 3 kinds of thing:

a combining char to follow a letter
a standalone char (or string) which looks like the TeX glyph. This will either be a "spacing" analog of the accent, or at worst a space followed by the combiner
an "unwrapped" char which is just the accent part which would be the content of a Math token used as an over/under script.
So, we've set up a table within LaTeXML::Util::Unicode to store this info, and use it wherever accenting operations are needed.

… standalone char and for use in math

…g) chars for accents, leveraging data from Util::Unicode module

…rand token

…choices

dginev

I left some perl-oriented comments.

Would need to study accents a bit more before I can say anything substantive on the TeX emulation side of affairs.

dginev · 2024-08-22T18:36:55Z

lib/LaTeXML/Engine/TeX_Character.pool.ltxml

+    if (my $entry = unicode_accent($char)) {
+      applyAccent($stomach, $letter, $$entry{combiner}, $$entry{standalone},
+        Invocation(T_CS('\accent'), $num, $letter)); }
+    else {    # Unknown accent ?  Really should OVERLAY it on top of $letter???


Do we have a small test for using an unknown accent? Might be useful to keep checking as things move forward.

OK, put in a simplistic handling for this case, along w/CSS and an extra line in test.

dginev · 2024-08-23T00:42:17Z

lib/LaTeXML/Util/Unicode.pm

@@ -24,6 +23,73 @@ sub UTF {
  my ($code) = @_;
  return pack('U', $code); }

+my $NBSP = UTF(0xA0);
+sub NBSP { return $NBSP; }


Is this attempting to guard the lexical $NBSP from some sort of external redefinition?

Surprised it wasn't introduced as a package-level our $NSBP; declaration. That can be added to @EXPORT too, if desired.

Thanks, switched over to cleaner \N{NBSP} approach (along with use charnames ':full'; for older perls)

dginev · 2024-08-23T00:48:47Z

lib/LaTeXML/Util/Unicode.pm

+foreach my $entry (@accent_data) {
+  $accent_data{ $$entry{standalone} } = $entry;
+  $accent_data{ $$entry{combiner} }   = $entry;
+}


Oh, this is rather dangerous. Could you change the name of either @accent_data or %accent_data?

I was thrown for a bit not knowing what happened, seeing the hash indexing syntax in $accent_data{$char}.

Such a setup is bound to generate a subtle bug one day, when someone uses the wrong delimiters in haste:

my @a = ('x','y','z'); my %a = ('1'=>'a', '2'=>'b', '3'=>'c'); print $a[2]; # z print $a{2}; # b

I think the hash variant can be kept, and you can even fully expand the loop that adds standalone and combiner in code, to avoid looping the setters every time latexml is ran...

Simplest is to run it once, print with Dumper, then copy the result back into the file. And on second read - aren't standalone and combiner already set in the explicit hashes? Maybe the loop is a leftover? Except that the loop sets them as equal, while the hash often has the values as different. Hm, I wish I understood a little more...

Renamed a bit to make it less scary.

…; add a testcase

dginev

Thank you for the updates, the PR looks ready to merge. ✅

I tried another two alternatives for the "unknown accent" markup, visible in this codepen. Not a focus at the moment, but something to mull over in the background. <ruby> and display: inline-grid; are curious tools.

brucemiller added 8 commits August 20, 2024 14:53

Add Unicode data and accessor for accents, along with combining char,…

3b3a651

… standalone char and for use in math

Update OT1 FontMap to use appropriate 'standalone' chars for accents

d65604a

Use more consistent model of unicode combining and standalone (spacin…

b3d434c

…g) chars for accents, leveraging data from Util::Unicode module

safer lookup

c40389e

Use new Util::Unicode data to get 'unwrapped' char for over/under ope…

ff8bec4

…rand token

Update use of DefAccent to be consistent with Util::Unicode's better …

c3f28b3

…choices

Make keywords avoid clumsy font recoding

0a495ec

Updated tests for better tracking of font encoding

8cf43d5

brucemiller requested a review from dginev August 21, 2024 01:17

dginev approved these changes Aug 23, 2024

View reviewed changes

brucemiller added 3 commits August 24, 2024 17:46

Simplistic handling of \accent (as overlay) when a non-accent is used…

a5b8e94

…; add a testcase

Cleaner naming conventions; use \N{NBSP} instead of a var.

650cbfb

HTML/MathML tests no longer need javascript polyfill

642281a

dginev approved these changes Aug 25, 2024

View reviewed changes

brucemiller merged commit dfc3bc9 into master Aug 25, 2024
26 checks passed

brucemiller deleted the accents branch August 25, 2024 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accents #2404

Accents #2404

brucemiller commented Aug 21, 2024

dginev left a comment

dginev Aug 22, 2024

brucemiller Aug 24, 2024

dginev Aug 23, 2024

brucemiller Aug 24, 2024

dginev Aug 23, 2024

dginev Aug 23, 2024

brucemiller Aug 24, 2024

dginev left a comment

Accents #2404

Accents #2404

Conversation

brucemiller commented Aug 21, 2024

dginev left a comment

Choose a reason for hiding this comment

dginev Aug 22, 2024

Choose a reason for hiding this comment

brucemiller Aug 24, 2024

Choose a reason for hiding this comment

dginev Aug 23, 2024

Choose a reason for hiding this comment

brucemiller Aug 24, 2024

Choose a reason for hiding this comment

dginev Aug 23, 2024

Choose a reason for hiding this comment

dginev Aug 23, 2024

Choose a reason for hiding this comment

brucemiller Aug 24, 2024

Choose a reason for hiding this comment

dginev left a comment

Choose a reason for hiding this comment