Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upside-down exclamation mark, question mark ¡¿ #2075

Closed
nschloe opened this issue Apr 25, 2023 · 4 comments · Fixed by #2156
Closed

upside-down exclamation mark, question mark ¡¿ #2075

nschloe opened this issue Apr 25, 2023 · 4 comments · Fixed by #2156

Comments

@nschloe
Copy link

nschloe commented Apr 25, 2023

MWE:

\documentclass{article}
\begin{document}
!` ?`
\end{document}

Output:

!‘ ?‘

Expected output:

¡¿
@dginev dginev added this to the LaTeXML-0.8.8 milestone Apr 25, 2023
@brucemiller
Copy link
Owner

I was surprised that TeX did this. When I dug into the TeX book to make sure I hadn't missed others, I was reminded of the more common & traditional ff,fi,fl,ffi,ffl ligatures. I'd intentionally omitted those originally, out of misplaced(?) concerns about search or something?

I went ahead and added them as well, with subtle, but decidedly elegant, results that ooze "Quality" :> But alas they "break" (change) 96 test cases! That's easily updated, if tedious. But before I started that, I wanted to check whether I'm going to get massive pushback from our regulars: @dginev, @tkw1536, @teepeemm, @matteosecli , @xworld21

@dginev
Copy link
Collaborator

dginev commented Jul 21, 2023

For the record, I really liked the minimal nature of the issue, focused tightly on the ¡¿ char. By extending it to all missing unicode ligatures we can find, we may over-correct if we aren't careful...

So the main implications are that a Unicode ligature char (e.g. using U+FB03 ffi instead of the literal ffi), may lead to losing simple searchability. It will definitely be lost in the browser ctrl+f sense ("ffi" doesn't match ffi and vice versa).

And full-blown search engines should have unicode normalization properly employed in the correct places, if they want to recognize the words using the ligatures.

My personal preference would be to tackle the minimal issue here as posed (since it is clear and without hidden icebergs), and make further issues for the trickier ligatures. I'm quite grateful to @nschloe for already chunking some of the problems into bite-sized pieces. But that's just me :)

@brucemiller
Copy link
Owner

Oooh. Curious; I just compared the results at the browser level. In Firefox, I get essentially identical results whether or not I combine (eg. ffi to \x{FB03}) or not. Apparently, with whatever font it's using, it is doing the ligature itself. Nice! However, Chrome (with it's default font) does not apply the ligature itself, and the ligature is ending up in a sanserif font which looks weird. If I change chrome to use "Noto Serif" (which is what my firefox was using), it also is applying the ligatures.

So, I guess the takeaway is that these common ligatures are built into decent fonts these days, and we don't really need to mess with them at the TeX level.

@teepeemm
Copy link
Contributor

There's also the CSS font-variant-ligatures, which has the default normal in Chrome.

The other interesting thing would be if we could prevent the ligature in shelf{}ful. But that can wait for some other issue, if it's not already prevented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants