Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for variation sequences cluster matching #45681

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
38 changes: 38 additions & 0 deletions css/css-fonts/support/css/variation-sequences.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
@font-face {
font-family: "MonoEmojiFont";
src: url(../../resources/vs/NotoEmoji-Regular_subset.ttf);
}

@font-face {
font-family: "ColorEmojiFont";
src: url(../../resources/vs/NotoColorEmoji-Regular_subset.ttf);
}

@font-face {
font-family: "EmojiFontWithBaseCharOnly";
src: url(../../resources/vs/NotoEmoji-Regular_without-cmap14-subset.ttf);
}

@font-face {
font-family: "CJKFontWithVS";
src: url(../../resources/vs/NotoSansJP-Regular_with-cmap14-subset.ttf);
}

@font-face {
font-family: "CJKFontWithBaseCharOnly";
src: url(../../resources/vs/MPLUS1-Regular_without-cmap14-subset.ttf);
}

@font-face {
font-family: "MathFontWithVS";
src: url(../../resources/vs/STIXTwoMath-Regular_with-cmap14-subset.ttf);
}

@font-face {
font-family: "MathFontWithBaseCharOnly";
src: url(../../resources/vs/NotoSansMath-Regular_without-cmap14-subset.ttf);
}

body {
font-size: 24px;
}
125 changes: 125 additions & 0 deletions css/css-fonts/support/js/variation-sequences.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
var baseChars = {
"emoji": "\u{1fae8}",
"cjk": "\u{8279}",
"math": "\u{2205}"
};

var variationSelectors = {
"emoji": ["\u{fe0e}", "\u{fe0f}"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last I checked, on cocoa platforms, only FE0F and FE0E are supposed to affect font selection. All the other variation selectors are supposed to be applied, if they exist, on the pre-selected font.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Last time I checked, If the source text includes one of the other variation selectors, and the selected font doesn’t react to it, then the variation selector is just ignored rather than drawn as an additional visible character)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please send a link where you saw it? I was referring to https://www.w3.org/TR/css-fonts-4/#cluster-matching, and it doesn't seem to limit this to FE0F and FE0E."

Sorry, I'm not sure if I understood your second statement correctly. The test only checks if the correct family was applied to the variation sequence, do you want to remove variation selectors from test expectation, where they should be ignored?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, my comment means that I think that, on Cocoa platforms, families are only supposed to be affected by FE0F and FE0E, and not the other VSes.

If the spec indicates that all VSes should affect font selection, then the spec is wrong and should be changed.

@nedley is the expert here; I will bow out now and let other people (@vitorroriz? @fantasai?) continue this discussion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the spec indicates that all VSes should affect font selection, then the spec is wrong and should be changed.

I don't know if Unicode has a word on this. But it occurs to me that ignoring the VS would lose semantics whereas another font might be able to retain it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if @PeterCon knows about the Unicode question.

Copy link

@nedley nedley Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Unicode Standard §23.4 “Variation Selectors” says: “Combinations of particular base characters plus particular variation selectors have no effect on display unless they occur in pre-defined lists maintained by the Unicode Consortium.” My concern is that the CSS definition of support seemingly only refers to cmap coverage, and I certainly would not like to see 'a' + VS1 result in fallback to a CJK font.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unicode doesn't exactly specify how font selection should be handled wrt to VSes, but it does say something that can entail a font selection impact for implementations:

When emoji were first encoded, some emoji were unified with existing symbols in Unicode (in hindsight, the wrong decision, but it can't be undone). That resulted in ambiguity as to how those characters should be displayed: as a monochrome ("text") character, or polychrome emoji? To resolve this, Unicode later defined variation sequences for emoji, including sequences to select both of those options:

  • sequences using FE0E to select "text style" (non-emoji)
  • sequences using FE0F to select "emoji style"

(The data file with defined emoji sequences is emoji-variation-sequences.txt

In principle, both could affect how font selection is handled. An implementation might, however, only care about the "emoji_style" sequences since that is a case that will require a specific, colour font. (For "text style", the selected font or any general fallback font(s) might suffice.)

Unicode defines other variation sequences that also could be considered in font selection if a product had particular fonts for those sequences. This might be relevant more for some sequences and less for others. (See [StandardizedVariants.txt](https://www.[unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt](https://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt) for the relevant data file.)

For example, on the less-relevant side, there are variation sequences defined for Phags-pa script, but whatever default font is already used for Phags-pa is likely to already support those sequences.

But on the more-relevant side, there are sequences defined for various math characters. Some of those characters could be supported in fonts not specifically designed for math formulas; for instance, Segoe UI Symbol supports U+2269 GREATER-THAN BUT NOT EQUAL TO, but it doesn't support the variation sequence 2269 FE00 (i.e., both display the same glyph). However, the Cambria Math font does support the variation sequence and displays it differently from 2269 without the VS. So, an app could allow some default font selection for 2269 without the VS, but select a specific math font if the variation sequence is used.

Also on the more-relevant side, another class of cases to be aware of is CJK ideographic variation sequences. These are not defined in The Unicode Standard itself, but are recorded in registered sets according to UTS #37, Unicode Ideographic Variation Database. (See https://www.unicode.org/ivd/.) The sets will be registered that are interested in ideographs for a particular region. (Most of the registered sets are defined for Japanese usage.) So, suppose content contains U+3405 "㐅". Depending on language or region settings, font selection algorithm could pick a Hans font, or Hant, or Japanese. But if the content contains the variation sequence 3405 E0100, then it could pick a Japanese font that supports the variation sequence, regardless of language or region settings.

There's no one right answer in general. For your purposes, I'm not sure what would be most appropriate. For "emoji style" sequences you probably want to pick a colour emoji font, and for the "text style" sequences, you should pick a monochrome/non-emoji font. Should you handle the math sequeces specially when in a Math ML context, or should you have special handling for the ideographic variation sequences? That's less clear to me since I'm not familiar enough with goals and requirements for your scenario.

Hope that helps.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly would not like to see 'a' + VS1 result in fallback to a CJK font.

If the sequence is nonsensical, do we care?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the sequence is nonsensical, do we care?

I think it’s important to make a conscious decision because it can lead to inappropriate user expectations. Apple platforms don’t consider any variation selectors besides VS15 and VS16 for fallback because a VS is a default ignorable code point and letting an invisible character drive fallback is usually not a good experience. It’s not clear to me that a Chinese user encountering a Japanese IVS really wants to see a different font anyway.

"cjk": ["", "\u{FE00}", "\u{FE01}", "\u{e0100}", "\u{e0101}",
"\u{e0102}"
],
"math": ["", "\u{FE00}"]
};

var families = {
"emoji": ["ColorEmojiFont", "MonoEmojiFont",
"EmojiFontWithBaseCharOnly",
"sans-serif"
],
"cjk": ["CJKFontWithVS", "CJKFontWithBaseCharOnly",
"sans-serif"
],
"math": ["MathFontWithVS", "MathFontWithBaseCharOnly",
"sans-serif"
]
};

var variationSequenceFamilies = new Map([
["\u{1fae8}\u{fe0e}", "MonoEmojiFont"],
["\u{1fae8}\u{fe0f}", "ColorEmojiFont"],
["\u{8279}\u{fe00}", "CJKFontWithVS"],
["\u{8279}\u{fe01}", "CJKFontWithVS"],
["\u{8279}\u{e0100}", "CJKFontWithVS"],
["\u{8279}\u{e0101}", "CJKFontWithVS"],
["\u{8279}\u{e0102}", "CJKFontWithVS"],
["\u{2205}\u{FE00}", "MathFontWithVS"]
]);

var baseCharFamilies = new Map([
["\u{1fae8}", new Set(["MonoEmojiFont", "ColorEmojiFont",
"EmojiFontWithBaseCharOnly"
])],
["\u{8279}", new Set(["CJKFontWithVS",
"CJKFontWithBaseCharOnly"
])],
["\u{2205}", new Set(["MathFontWithVS",
"MathFontWithBaseCharOnly"
])]
]);

const range = function*(l) {
for (let i = 0; i < l; i += 1) yield i;
}
const isEmpty = arr =>
arr.length === 0;

const permutations =
function*(a) {
const r = arguments[1] || [];
if (isEmpty(a))
yield r;
for (let i of range(a.length)) {
const aa = [...a];
const rr = [...r, ...aa.splice(i, 1)];
yield* permutations(aa, rr);
}
}

function getMatchedFamilyForVariationSequence(
familyList, baseCharacter, variationSelector) {
const variationSequence = baseCharacter + variationSelector;
// First try to find a match for the whole variation sequence.
if (variationSequenceFamilies.has(variationSequence)) {
const matchedFamily = variationSequenceFamilies.get(variationSequence);
if (familyList.includes(matchedFamily)) {
return matchedFamily;
}
}
// If failed, try to match only the base character from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this behavior actually specified in the spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how I interpret this statement:

If no font on the system supports the full sequence, match the single character b using the normal procedure for matching single characters and ignore the variation selector.

// variation sequence.
if (baseCharFamilies.has(baseCharacter)) {
const eligibleFamilies = baseCharFamilies.get(baseCharacter);
const matchedFamilies =
familyList.filter(value => eligibleFamilies.has(value));
if (matchedFamilies.length) {
return matchedFamilies[0];
}
}
// We should not reach here, we should always match one of the
// specified web fonts in the tests.
return "";
}

function generateContent(
families, baseChar, variationSelectors, getFontFamilyValue) {
var rootElem = document.createElement('div');
// We want to test all possible combinations of variation
// selectors and font-family list values. For the refs,
// we explicitly specify the font that we expect to be
// matched from the maps at the beginning of the files.
const allFamiliesLists = permutations(families);
for (const familyList of allFamiliesLists) {
for (const variationSelector of variationSelectors) {
const contentSpan = document.createElement("span");
contentSpan.textContent = baseChar + variationSelector;
contentSpan.style.fontFamily =
getFontFamilyValue(familyList, baseChar, variationSelector);
rootElem.appendChild(contentSpan);
}
}
document.body.appendChild(rootElem);
}

function generateVariationSequenceTests(type) {
var getFontFamilyValue = (familyList, baseChar, variationSelector) => {
return familyList.join(', ');
}
generateContent(families[type], baseChars[type], variationSelectors[type], getFontFamilyValue);
}

function generateVariationSequenceRefs(type) {
generateContent(
families[type], baseChars[type], variationSelectors[type],
getMatchedFamilyForVariationSequence);
}
11 changes: 11 additions & 0 deletions css/css-fonts/variation-sequences-ref.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<!DOCTYPE html>
<meta charset="UTF-8" />
<title>CSS Test: Cluster Matching Variation Sequences</title>
<link rel="stylesheet" type="text/css" href="support/css/variation-sequences.css" />
<script type="text/javascript" src="support/js/variation-sequences.js"></script>
<body></body>
<script>
generateVariationSequenceRefs("emoji");
generateVariationSequenceRefs("cjk");
generateVariationSequenceRefs("math");
</script>
18 changes: 18 additions & 0 deletions css/css-fonts/variation-sequences.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<!DOCTYPE html>
<meta charset="UTF-8" />
<title>CSS Test: Cluster Matching Variation Sequences</title>
<link rel="help" href="https://www.w3.org/TR/css-fonts-4/#cluster-matching" />
<link rel="help" href="https://unicode.org/reports/tr51/" />
<link rel="help" href="https://unicode.org/reports/tr37/" />
<link rel="help" href="https://www.unicode.org/Public/UNIDATA/StandardizedVariants.txt" />
<link rel="help" href="https://www.unicode.org/versions/Unicode15.1.0/ch23.pdf#G19053" />
<link rel="match" href="variation-sequences-ref.html">
<meta name="assert" content="Variation sequences should be taken into account during cluster matching.">
<link rel="stylesheet" type="text/css" href="support/css/variation-sequences.css" />
<script type="text/javascript" src="support/js/variation-sequences.js"></script>
<body></body>
<script>
generateVariationSequenceTests("emoji");
generateVariationSequenceTests("cjk");
generateVariationSequenceTests("math");
</script>