Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accents #2404

Merged
merged 11 commits into from
Aug 25, 2024
Merged

Accents #2404

merged 11 commits into from
Aug 25, 2024

Conversation

brucemiller
Copy link
Owner

This PR is another round in improving the fidelity of lower-level TeX and plain.tex. It deals with accents. Accents in TeX fonts basically show an accented space which gets overlaid on top of a character to be accented (potentially with some positioning tweaks). To deal with Unicode and MathML, we need to distinguish 3 kinds of thing:

  • a combining char to follow a letter
  • a standalone char (or string) which looks like the TeX glyph. This will either be a "spacing" analog of the accent, or at worst a space followed by the combiner
  • an "unwrapped" char which is just the accent part which would be the content of a Math token used as an over/under script.
    So, we've set up a table within LaTeXML::Util::Unicode to store this info, and use it wherever accenting operations are needed.

Copy link
Collaborator

@dginev dginev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some perl-oriented comments.

Would need to study accents a bit more before I can say anything substantive on the TeX emulation side of affairs.

if (my $entry = unicode_accent($char)) {
applyAccent($stomach, $letter, $$entry{combiner}, $$entry{standalone},
Invocation(T_CS('\accent'), $num, $letter)); }
else { # Unknown accent ? Really should OVERLAY it on top of $letter???
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a small test for using an unknown accent? Might be useful to keep checking as things move forward.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, put in a simplistic handling for this case, along w/CSS and an extra line in test.

@@ -24,6 +23,73 @@ sub UTF {
my ($code) = @_;
return pack('U', $code); }

my $NBSP = UTF(0xA0);
sub NBSP { return $NBSP; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this attempting to guard the lexical $NBSP from some sort of external redefinition?

Surprised it wasn't introduced as a package-level our $NSBP; declaration. That can be added to @EXPORT too, if desired.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, switched over to cleaner \N{NBSP} approach (along with use charnames ':full'; for older perls)

foreach my $entry (@accent_data) {
$accent_data{ $$entry{standalone} } = $entry;
$accent_data{ $$entry{combiner} } = $entry;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is rather dangerous. Could you change the name of either @accent_data or %accent_data?

I was thrown for a bit not knowing what happened, seeing the hash indexing syntax in $accent_data{$char}.

Such a setup is bound to generate a subtle bug one day, when someone uses the wrong delimiters in haste:

my @a = ('x','y','z');
my %a = ('1'=>'a', '2'=>'b', '3'=>'c');
print $a[2]; # z
print $a{2}; # b

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the hash variant can be kept, and you can even fully expand the loop that adds standalone and combiner in code, to avoid looping the setters every time latexml is ran...

Simplest is to run it once, print with Dumper, then copy the result back into the file. And on second read - aren't standalone and combiner already set in the explicit hashes? Maybe the loop is a leftover? Except that the loop sets them as equal, while the hash often has the values as different. Hm, I wish I understood a little more...

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed a bit to make it less scary.

Copy link
Collaborator

@dginev dginev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates, the PR looks ready to merge. ✅

I tried another two alternatives for the "unknown accent" markup, visible in this codepen. Not a focus at the moment, but something to mull over in the background. <ruby> and display: inline-grid; are curious tools.

@brucemiller brucemiller merged commit dfc3bc9 into master Aug 25, 2024
26 checks passed
@brucemiller brucemiller deleted the accents branch August 25, 2024 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants