Add test for variation sequences cluster matching #45681

tursunova · 2024-04-12T14:10:07Z

There is no test coverage for cluster matching variation sequences. According to CSS Fonts Module Level 4, cluster matching algorithm should be used for variation sequences during font matching.

This PR tests font matching for emoji variation sequences (unicode spec), standardized variation sequences (unicode spec) and ideographic variation sequences (unicode spec).

drott

This looks good to me. I find this a thorough set of tests to cover cluster matching with variation selectors without relying on a particular environment of installed fonts.

I'd be curious to hear feedback on these tests @jfkthame, @nt1m, @fantasai, @svgeesus.

FYI @shivamidow

litherum · 2024-04-12T19:35:14Z

css/css-fonts/support/js/variation-sequences.js

+      return matchedFamily;
+    }
+  }
+  // If failed, try to match only the base character from the


Is this behavior actually specified in the spec?

That's how I interpret this statement:

If no font on the system supports the full sequence, match the single character b using the normal procedure for matching single characters and ignore the variation selector.

litherum · 2024-04-12T19:40:12Z

css/css-fonts/support/js/variation-sequences.js

+};
+
+var variationSelectors = {
+    "emoji": ["\u{fe0e}", "\u{fe0f}"],


Last I checked, on cocoa platforms, only FE0F and FE0E are supposed to affect font selection. All the other variation selectors are supposed to be applied, if they exist, on the pre-selected font.

(Last time I checked, If the source text includes one of the other variation selectors, and the selected font doesn’t react to it, then the variation selector is just ignored rather than drawn as an additional visible character)

Could you please send a link where you saw it? I was referring to https://www.w3.org/TR/css-fonts-4/#cluster-matching, and it doesn't seem to limit this to FE0F and FE0E."

Sorry, I'm not sure if I understood your second statement correctly. The test only checks if the correct family was applied to the variation sequence, do you want to remove variation selectors from test expectation, where they should be ignored?

No, my comment means that I think that, on Cocoa platforms, families are only supposed to be affected by FE0F and FE0E, and not the other VSes.

If the spec indicates that all VSes should affect font selection, then the spec is wrong and should be changed.

@nedley is the expert here; I will bow out now and let other people (@vitorroriz? @fantasai?) continue this discussion.

If the spec indicates that all VSes should affect font selection, then the spec is wrong and should be changed.

I don't know if Unicode has a word on this. But it occurs to me that ignoring the VS would lose semantics whereas another font might be able to retain it.

I wonder if @PeterCon knows about the Unicode question.

The Unicode Standard §23.4 “Variation Selectors” says: “Combinations of particular base characters plus particular variation selectors have no effect on display unless they occur in pre-defined lists maintained by the Unicode Consortium.” My concern is that the CSS definition of support seemingly only refers to cmap coverage, and I certainly would not like to see 'a' + VS1 result in fallback to a CJK font.

Unicode doesn't exactly specify how font selection should be handled wrt to VSes, but it does say something that can entail a font selection impact for implementations:

When emoji were first encoded, some emoji were unified with existing symbols in Unicode (in hindsight, the wrong decision, but it can't be undone). That resulted in ambiguity as to how those characters should be displayed: as a monochrome ("text") character, or polychrome emoji? To resolve this, Unicode later defined variation sequences for emoji, including sequences to select both of those options:

sequences using FE0E to select "text style" (non-emoji)

sequences using FE0F to select "emoji style"

(The data file with defined emoji sequences is emoji-variation-sequences.txt

In principle, both could affect how font selection is handled. An implementation might, however, only care about the "emoji_style" sequences since that is a case that will require a specific, colour font. (For "text style", the selected font or any general fallback font(s) might suffice.)

Unicode defines other variation sequences that also could be considered in font selection if a product had particular fonts for those sequences. This might be relevant more for some sequences and less for others. (See [StandardizedVariants.txt](https://www.[unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt](https://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt) for the relevant data file.)

For example, on the less-relevant side, there are variation sequences defined for Phags-pa script, but whatever default font is already used for Phags-pa is likely to already support those sequences.

But on the more-relevant side, there are sequences defined for various math characters. Some of those characters could be supported in fonts not specifically designed for math formulas; for instance, Segoe UI Symbol supports U+2269 GREATER-THAN BUT NOT EQUAL TO, but it doesn't support the variation sequence 2269 FE00 (i.e., both display the same glyph). However, the Cambria Math font does support the variation sequence and displays it differently from 2269 without the VS. So, an app could allow some default font selection for 2269 without the VS, but select a specific math font if the variation sequence is used.

Also on the more-relevant side, another class of cases to be aware of is CJK ideographic variation sequences. These are not defined in The Unicode Standard itself, but are recorded in registered sets according to UTS #37, Unicode Ideographic Variation Database. (See https://www.unicode.org/ivd/.) The sets will be registered that are interested in ideographs for a particular region. (Most of the registered sets are defined for Japanese usage.) So, suppose content contains U+3405 "㐅". Depending on language or region settings, font selection algorithm could pick a Hans font, or Hant, or Japanese. But if the content contains the variation sequence 3405 E0100, then it could pick a Japanese font that supports the variation sequence, regardless of language or region settings.

There's no one right answer in general. For your purposes, I'm not sure what would be most appropriate. For "emoji style" sequences you probably want to pick a colour emoji font, and for the "text style" sequences, you should pick a monochrome/non-emoji font. Should you handle the math sequeces specially when in a Math ML context, or should you have special handling for the ideographic variation sequences? That's less clear to me since I'm not familiar enough with goals and requirements for your scenario.

Hope that helps.

I certainly would not like to see 'a' + VS1 result in fallback to a CJK font.

If the sequence is nonsensical, do we care?

If the sequence is nonsensical, do we care?

I think it’s important to make a conscious decision because it can lead to inappropriate user expectations. Apple platforms don’t consider any variation selectors besides VS15 and VS16 for fallback because a VS is a default ignorable code point and letting an invisible character drive fallback is usually not a good experience. It’s not clear to me that a Chinese user encountering a Japanese IVS really wants to see a different font anyway.

drott · 2024-04-20T00:50:46Z

Thanks @nedley, @behdad and @PeterConstable for your inputs. From my point of view, we're okay to land this set of tests. They are conformant with the spec. Should there be concerns with where cluster matching applies as it is defined now, or if there is a desire to restrict cluster matching and fallback rules to specific use cases or variation selectors, I suggest to open new issues on the CSS Fonts spec.

Add test for variation sequences cluster matching

17d103c

tursunova requested a review from drott April 12, 2024 14:10

wpt-pr-bot added the css-fonts label Apr 12, 2024

wpt-pr-bot assigned litherum Apr 12, 2024

wpt-pr-bot requested review from jfkthame, litherum and svgeesus April 12, 2024 14:10

drott approved these changes Apr 12, 2024

View reviewed changes

litherum reviewed Apr 12, 2024

View reviewed changes

tursunova merged commit 076e7c1 into web-platform-tests:master Apr 22, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test for variation sequences cluster matching #45681

Add test for variation sequences cluster matching #45681

tursunova commented Apr 12, 2024

drott left a comment •

edited

Loading

litherum Apr 12, 2024

tursunova Apr 15, 2024

litherum Apr 12, 2024

litherum Apr 12, 2024

tursunova Apr 15, 2024

litherum Apr 15, 2024

behdad Apr 18, 2024

behdad Apr 18, 2024

nedley Apr 18, 2024 •

edited

Loading

PeterConstable Apr 19, 2024

behdad Apr 19, 2024

nedley Apr 19, 2024

drott commented Apr 20, 2024 •

edited

Loading

Add test for variation sequences cluster matching #45681

Add test for variation sequences cluster matching #45681

Conversation

tursunova commented Apr 12, 2024

drott left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nedley Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drott commented Apr 20, 2024 • edited Loading

drott left a comment •

edited

Loading

nedley Apr 18, 2024 •

edited

Loading

drott commented Apr 20, 2024 •

edited

Loading