Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for variation sequences cluster matching #45681

Conversation

tursunova
Copy link
Contributor

There is no test coverage for cluster matching variation sequences. According to CSS Fonts Module Level 4, cluster matching algorithm should be used for variation sequences during font matching.

This PR tests font matching for emoji variation sequences (unicode spec), standardized variation sequences (unicode spec) and ideographic variation sequences (unicode spec).

Copy link
Contributor

@drott drott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I find this a thorough set of tests to cover cluster matching with variation selectors without relying on a particular environment of installed fonts.

I'd be curious to hear feedback on these tests @jfkthame, @nt1m, @fantasai, @svgeesus.

FYI @shivamidow

return matchedFamily;
}
}
// If failed, try to match only the base character from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this behavior actually specified in the spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how I interpret this statement:

If no font on the system supports the full sequence, match the single character b using the normal procedure for matching single characters and ignore the variation selector.

};

var variationSelectors = {
"emoji": ["\u{fe0e}", "\u{fe0f}"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last I checked, on cocoa platforms, only FE0F and FE0E are supposed to affect font selection. All the other variation selectors are supposed to be applied, if they exist, on the pre-selected font.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Last time I checked, If the source text includes one of the other variation selectors, and the selected font doesn’t react to it, then the variation selector is just ignored rather than drawn as an additional visible character)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please send a link where you saw it? I was referring to https://www.w3.org/TR/css-fonts-4/#cluster-matching, and it doesn't seem to limit this to FE0F and FE0E."

Sorry, I'm not sure if I understood your second statement correctly. The test only checks if the correct family was applied to the variation sequence, do you want to remove variation selectors from test expectation, where they should be ignored?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, my comment means that I think that, on Cocoa platforms, families are only supposed to be affected by FE0F and FE0E, and not the other VSes.

If the spec indicates that all VSes should affect font selection, then the spec is wrong and should be changed.

@nedley is the expert here; I will bow out now and let other people (@vitorroriz? @fantasai?) continue this discussion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the spec indicates that all VSes should affect font selection, then the spec is wrong and should be changed.

I don't know if Unicode has a word on this. But it occurs to me that ignoring the VS would lose semantics whereas another font might be able to retain it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if @PeterCon knows about the Unicode question.

Copy link

@nedley nedley Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Unicode Standard §23.4 “Variation Selectors” says: “Combinations of particular base characters plus particular variation selectors have no effect on display unless they occur in pre-defined lists maintained by the Unicode Consortium.” My concern is that the CSS definition of support seemingly only refers to cmap coverage, and I certainly would not like to see 'a' + VS1 result in fallback to a CJK font.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unicode doesn't exactly specify how font selection should be handled wrt to VSes, but it does say something that can entail a font selection impact for implementations:

When emoji were first encoded, some emoji were unified with existing symbols in Unicode (in hindsight, the wrong decision, but it can't be undone). That resulted in ambiguity as to how those characters should be displayed: as a monochrome ("text") character, or polychrome emoji? To resolve this, Unicode later defined variation sequences for emoji, including sequences to select both of those options:

  • sequences using FE0E to select "text style" (non-emoji)
  • sequences using FE0F to select "emoji style"

(The data file with defined emoji sequences is emoji-variation-sequences.txt

In principle, both could affect how font selection is handled. An implementation might, however, only care about the "emoji_style" sequences since that is a case that will require a specific, colour font. (For "text style", the selected font or any general fallback font(s) might suffice.)

Unicode defines other variation sequences that also could be considered in font selection if a product had particular fonts for those sequences. This might be relevant more for some sequences and less for others. (See [StandardizedVariants.txt](https://www.[unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt](https://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt) for the relevant data file.)

For example, on the less-relevant side, there are variation sequences defined for Phags-pa script, but whatever default font is already used for Phags-pa is likely to already support those sequences.

But on the more-relevant side, there are sequences defined for various math characters. Some of those characters could be supported in fonts not specifically designed for math formulas; for instance, Segoe UI Symbol supports U+2269 GREATER-THAN BUT NOT EQUAL TO, but it doesn't support the variation sequence 2269 FE00 (i.e., both display the same glyph). However, the Cambria Math font does support the variation sequence and displays it differently from 2269 without the VS. So, an app could allow some default font selection for 2269 without the VS, but select a specific math font if the variation sequence is used.

Also on the more-relevant side, another class of cases to be aware of is CJK ideographic variation sequences. These are not defined in The Unicode Standard itself, but are recorded in registered sets according to UTS #37, Unicode Ideographic Variation Database. (See https://www.unicode.org/ivd/.) The sets will be registered that are interested in ideographs for a particular region. (Most of the registered sets are defined for Japanese usage.) So, suppose content contains U+3405 "㐅". Depending on language or region settings, font selection algorithm could pick a Hans font, or Hant, or Japanese. But if the content contains the variation sequence 3405 E0100, then it could pick a Japanese font that supports the variation sequence, regardless of language or region settings.

There's no one right answer in general. For your purposes, I'm not sure what would be most appropriate. For "emoji style" sequences you probably want to pick a colour emoji font, and for the "text style" sequences, you should pick a monochrome/non-emoji font. Should you handle the math sequeces specially when in a Math ML context, or should you have special handling for the ideographic variation sequences? That's less clear to me since I'm not familiar enough with goals and requirements for your scenario.

Hope that helps.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly would not like to see 'a' + VS1 result in fallback to a CJK font.

If the sequence is nonsensical, do we care?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the sequence is nonsensical, do we care?

I think it’s important to make a conscious decision because it can lead to inappropriate user expectations. Apple platforms don’t consider any variation selectors besides VS15 and VS16 for fallback because a VS is a default ignorable code point and letting an invisible character drive fallback is usually not a good experience. It’s not clear to me that a Chinese user encountering a Japanese IVS really wants to see a different font anyway.

@drott
Copy link
Contributor

drott commented Apr 20, 2024

Thanks @nedley, @behdad and @PeterConstable for your inputs. From my point of view, we're okay to land this set of tests. They are conformant with the spec. Should there be concerns with where cluster matching applies as it is defined now, or if there is a desire to restrict cluster matching and fallback rules to specific use cases or variation selectors, I suggest to open new issues on the CSS Fonts spec.

@tursunova tursunova merged commit 076e7c1 into web-platform-tests:master Apr 22, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants