Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax <select> parser #10557

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Relax <select> parser #10557

wants to merge 17 commits into from

Conversation

josepharhar
Copy link
Contributor

@josepharhar josepharhar commented Aug 13, 2024

This patch makes the parser allow additional tags in <select> besides <option>, <optgroup>, and <hr>, mostly by removing the "in select" and "in select in table" parser modes.

In order to replicate the behavior where opening a <select> tag within another open <select> tag inserts a </select> close tag, a traversal through the stack of open elements was added which I borrowed from the <button> part of the parser.

This patch also changes the processing model to make <select> look through all its descendants in the DOM tree for <option> elements, rather than just children and optgroup children which conform to the content model. This is needed for compat reasons because there are websites which put other tags in between their <select> and <option>s which would break with this parser change unless we also update this processing model. More context here and here.

Fixes #10310

(See WHATWG Working Mode: Changes for more details.)


/form-elements.html ( diff )
/index.html ( diff )
/infrastructure.html ( diff )
/parsing.html ( diff )

@annevk
Copy link
Member

annevk commented Aug 14, 2024

Why does this also make a bunch of changes to the processing model of select elements? If that needs to be part of this change that should really be better motivated in the commit message.

@josepharhar
Copy link
Contributor Author

Why does this also make a bunch of changes to the processing model of select elements? If that needs to be part of this change that should really be better motivated in the commit message.

The changes in the processing model do support the new proposed content model for select, but they also mitigate compat issues for websites which are already putting tags in between <select> and <option> in their HTML.

For example, without these changes to the processing model, the following <select> would not register as having any options at all:

<select>
  <div>
    <option>...</option>
    ...
  </div>
</select>

In my compat analysis, I found a lot of websites which are doing this, so in order to ship the parser changes separately from customizable select in chrome, I plan to ship the parser changes and these processing model changes together, because otherwise there would be too much breakage.

I'm happy to put them in a separate PR if you want, or keep them here and update the commit message (sorry for not putting it in there). Which would you prefer?

@annevk
Copy link
Member

annevk commented Aug 16, 2024

Thanks! Given that rationale I think it's good to couple the changes, but that should be in the commit message as well.

@zcorpan
Copy link
Member

zcorpan commented Aug 19, 2024

This doesn't define optional tags for </option> and </optgroup> correctly.

The definition for "have a particular element in select scope" may be needed for that, but should be changed to be similar to "have a particular element in button scope" (but for select).

In particular, allow these without a parse error:

<select>
<optgroup>
<option>
<optgroup>
</select>

The second <optgroup> should pop the option and optgroup

<select>
<option><p>
<option>
</select>

This should generate implied end tags and pop the option.

<select>
<optgroup>
<hr>
<option>
<hr>
</select>

The hrs should pop the optgroup and option in a select.

See how the parser deals with <ruby><rtc><rt>, I think that can be used as a model for select.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@josepharhar
Copy link
Contributor Author

Thanks! Given that rationale I think it's good to couple the changes, but that should be in the commit message as well.

Done.

This doesn't define optional tags for </option> and </optgroup> correctly.

@zcorpan thanks for the feedback! I did some experimenting and added some logic to the start tags for option, optgroup, and hr. What do you think?

@zcorpan
Copy link
Member

zcorpan commented Aug 22, 2024

  • Need to check that a select element is in scope so that parsing of option/optgroup tags outside of select doesn't change. Example <option><p>1</option>2<option>3
  • Need to add select to the "in scope" list so that <div><select></div><option> doesn't close the select.
  • Should report parse errors when unexpected elements are popped. Compare with what the spec does for rt/rtc in ruby. Example <select><option><span>1</option>2<option>3

@josepharhar
Copy link
Contributor Author

  • Need to check that a select element is in scope so that parsing of option/optgroup tags outside of select doesn't change. Example <option><p>1</option>2<option>3

I thought that this is covered by "While the stack of open elements has an option element in select scope". What exactly should I change?

  • Need to add select to the "in scope" list so that <div><select></div><option> doesn't close the select.

Done

  • Should report parse errors when unexpected elements are popped. Compare with what the spec does for rt/rtc in ruby. Example <select><option><span>1</option>2<option>3

I'm guessing this is from "If the current node is not now a rtc element or a ruby element, this is a parse error," right?

Should I add "If the current node is not now an option element, this is a parse error" after "While the stack of open elements has an option element in select scope, pop an element from the stack of open elements"?

@zcorpan
Copy link
Member

zcorpan commented Aug 23, 2024

The "in select scope" I think should be removed altogether since it assumes the stack will not have other elements when in a select, which is no longer the case. Use the normal "in scope" instead.

High-level of what I think should happen: when parsing option or optgroup start tag: check that select is in scope, check that option or optgroup is in scope, generate implied end tags (except for optgroup when handling <option>), check the current node or the stack of open elements again, report a parse error if appropriate, then pop elements off the stack until the option or optgroup has been popped, then insert the new element.

I can look into this more next week and suggest more specific changes.

@josepharhar
Copy link
Contributor Author

High-level of what I think should happen: when parsing option or optgroup start tag: check that select is in scope, check that option or optgroup is in scope, generate implied end tags (except for optgroup when handling <option>), check the current node or the stack of open elements again, report a parse error if appropriate, then pop elements off the stack until the option or optgroup has been popped, then insert the new element.

Thanks! I gave this a try

@josepharhar
Copy link
Contributor Author

@zcorpan how does the latest text look?

Copy link
Member

@zcorpan zcorpan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the option/optgroup cases are right after these changes.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@josepharhar
Copy link
Contributor Author

This doesn't define optional tags for </option> and </optgroup> correctly.

I was talking to @mfreed7 about the changes we've made in the PR so far, and I feel like I couldn't provide a good explanation for why we are making the parser not support cases like these:

  • <hr> inside an <option>
  • nested <optgroup>s

Is it just compat reasons? Is there a good justification?

This patch makes the parser allow additional tags in <select> besides
<option>, <optgroup>, and <hr>, mostly by removing the "in select" and
"in select in table" parser modes.

In order to replicate the behavior where opening a <select> tag within
another open <select> tag inserts a </select> close tag, a traversal
through the stack of open elements was added which I borrowed from the
<button> part of the parser.

This will need test changes to be implemented in html5lib.

Fixes whatwg#10310
@zcorpan
Copy link
Member

zcorpan commented Sep 11, 2024

It would be a breaking change from what is conforming HTML today, and break compat for sites that omit </option> and </optgroup> tags. It's the same as why we can't allow certain elements in p.

@zcorpan
Copy link
Member

zcorpan commented Dec 3, 2024

Per discussion in Matrix (1, 2), the web compat situation is not yet clear, see e.g. <select><input> issue: https://issues.chromium.org/issues/379612186

@josepharhar what is the rollout plan in Chromium? I think it makes sense to wait with merging this PR until it has been shipping in Stable for a bit.

@zcorpan zcorpan added the do not merge yet Pull request must not be merged per rationale in comment label Dec 3, 2024
@josepharhar
Copy link
Contributor Author

I disabled this new parser behavior in chrome 131 due to a bug with one of multiple code paths which collects options to render in the select's popup which doesn't exist in the spec. I am going to re-enable in chrome 133, and yes <input> in <select> is still an open question. I'd like to support this if possible and I will come back here to report on that after the next shipping attempt.

I think it makes sense to wait with merging this PR until it has been shipping in Stable for a bit.

That sounds reasonable to me especially since shipping this change in chrome is not blocked by merging this PR.

chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Dec 4, 2024
This patch makes the HTML parser convert nested <select> tags into
</select> instead of </select><select> for backwards compatibility with
the old parser. This behavior will apply to any nested selects, such as
<select><div><select>.

This was recommended here:
whatwg/html#10557 (comment)

This will be tested here:
html5lib/html5lib-tests#178 (comment)

Change-Id: I7a39901582b96edc10deca8794b9ce117aa08229
aarongable pushed a commit to chromium/chromium that referenced this pull request Dec 4, 2024
This patch makes the HTML parser convert nested <select> tags into
</select> instead of </select><select> for backwards compatibility with
the old parser. This behavior will apply to any nested selects, such as
<select><div><select>.

This was recommended here:
whatwg/html#10557 (comment)

This will be tested here:
html5lib/html5lib-tests#178 (comment)

Change-Id: I7a39901582b96edc10deca8794b9ce117aa08229
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6040687
Commit-Queue: Joey Arhar <jarhar@chromium.org>
Reviewed-by: David Baron <dbaron@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1391787}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Dec 4, 2024
This patch makes the HTML parser convert nested <select> tags into
</select> instead of </select><select> for backwards compatibility with
the old parser. This behavior will apply to any nested selects, such as
<select><div><select>.

This was recommended here:
whatwg/html#10557 (comment)

This will be tested here:
html5lib/html5lib-tests#178 (comment)

Change-Id: I7a39901582b96edc10deca8794b9ce117aa08229
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6040687
Commit-Queue: Joey Arhar <jarhar@chromium.org>
Reviewed-by: David Baron <dbaron@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1391787}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Dec 4, 2024
This patch makes the HTML parser convert nested <select> tags into
</select> instead of </select><select> for backwards compatibility with
the old parser. This behavior will apply to any nested selects, such as
<select><div><select>.

This was recommended here:
whatwg/html#10557 (comment)

This will be tested here:
html5lib/html5lib-tests#178 (comment)

Change-Id: I7a39901582b96edc10deca8794b9ce117aa08229
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6040687
Commit-Queue: Joey Arhar <jarhar@chromium.org>
Reviewed-by: David Baron <dbaron@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1391787}
lukewarlow added a commit to lukewarlow/ladybird that referenced this pull request Dec 4, 2024
- Allow general content within select element
- Update processing of options for select

See whatwg/html#10557
@lukewarlow
Copy link
Member

I think this is missing a required spec change: https://html.spec.whatwg.org/multipage/parsing.html#:~:text=An%20end%20tag%20whose%20tag%20name%20is%20one%20of%3A%20%22address%22 - select needs adding to this list of elements?

aarongable pushed a commit to chromium/chromium that referenced this pull request Dec 4, 2024
Context here:
whatwg/html#10557 (comment)

Tests will be added here:
html5lib/html5lib-tests#178 (comment)

Change-Id: I62cd28979abff3d5ea30b3502872f273a74bfde4
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6042598
Commit-Queue: Joey Arhar <jarhar@chromium.org>
Reviewed-by: David Baron <dbaron@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1391836}
@lukewarlow
Copy link
Member

Btw I think this test needs updating as it's currently failing in Chromium with this change

@zcorpan
Copy link
Member

zcorpan commented Dec 6, 2024

@lukewarlow good catch! Right now the end tag is handled by "Any other end tag", which will ignore the tag if the current node is not a select element. (Chromium seems to implement something else, testing with <select><div></select>x.)

@josepharhar
Copy link
Contributor Author

I think this is missing a required spec change: https://html.spec.whatwg.org/multipage/parsing.html#:~:text=An%20end%20tag%20whose%20tag%20name%20is%20one%20of%3A%20%22address%22 - select needs adding to this list of elements?

@lukewarlow good catch! Right now the end tag is handled by "Any other end tag", which will ignore the tag if the current node is not a select element. (Chromium seems to implement something else, testing with <select><div></select>x.)

Thanks, I added select there in this PR. It's already included there in chromium.

Btw I think this test needs updating as it's currently failing in Chromium with this change

Thanks, fixing that here.

@flavorjones
Copy link

flavorjones commented Dec 7, 2024

@josepharhar Are the test changes in html5lib/html5lib-tests#178 up to date with these proposed spec changes now? It's been hard to follow along across the multiple repositories.

@josepharhar
Copy link
Contributor Author

@josepharhar Are the test changes in html5lib/html5lib-tests#178 up to date with these proposed spec changes now? It's been hard to follow along across the multiple repositories.

I just updated all tests in the PR

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Dec 10, 2024
…t>, a=testonly

Automatic update from web-platform-tests
Change parser behavior of <select><select>

This patch makes the HTML parser convert nested <select> tags into
</select> instead of </select><select> for backwards compatibility with
the old parser. This behavior will apply to any nested selects, such as
<select><div><select>.

This was recommended here:
whatwg/html#10557 (comment)

This will be tested here:
html5lib/html5lib-tests#178 (comment)

Change-Id: I7a39901582b96edc10deca8794b9ce117aa08229
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6040687
Commit-Queue: Joey Arhar <jarhar@chromium.org>
Reviewed-by: David Baron <dbaron@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1391787}

--

wpt-commits: c4c67222e7b0677f92f43280837db48473603f1e
wpt-pr: 49525
i3roly pushed a commit to i3roly/firefox-dynasty that referenced this pull request Dec 12, 2024
…t>, a=testonly

Automatic update from web-platform-tests
Change parser behavior of <select><select>

This patch makes the HTML parser convert nested <select> tags into
</select> instead of </select><select> for backwards compatibility with
the old parser. This behavior will apply to any nested selects, such as
<select><div><select>.

This was recommended here:
whatwg/html#10557 (comment)

This will be tested here:
html5lib/html5lib-tests#178 (comment)

Change-Id: I7a39901582b96edc10deca8794b9ce117aa08229
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6040687
Commit-Queue: Joey Arhar <jarhar@chromium.org>
Reviewed-by: David Baron <dbaron@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1391787}

--

wpt-commits: c4c67222e7b0677f92f43280837db48473603f1e
wpt-pr: 49525
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Dec 13, 2024
…t>, a=testonly

Automatic update from web-platform-tests
Change parser behavior of <select><select>

This patch makes the HTML parser convert nested <select> tags into
</select> instead of </select><select> for backwards compatibility with
the old parser. This behavior will apply to any nested selects, such as
<select><div><select>.

This was recommended here:
whatwg/html#10557 (comment)

This will be tested here:
html5lib/html5lib-tests#178 (comment)

Change-Id: I7a39901582b96edc10deca8794b9ce117aa08229
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6040687
Commit-Queue: Joey Arhar <jarharchromium.org>
Reviewed-by: David Baron <dbaronchromium.org>
Cr-Commit-Position: refs/heads/main{#1391787}

--

wpt-commits: c4c67222e7b0677f92f43280837db48473603f1e
wpt-pr: 49525

UltraBlame original commit: 666bbfe1252e12af9394a702ad62918ae14f44c2
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Dec 13, 2024
…t>, a=testonly

Automatic update from web-platform-tests
Change parser behavior of <select><select>

This patch makes the HTML parser convert nested <select> tags into
</select> instead of </select><select> for backwards compatibility with
the old parser. This behavior will apply to any nested selects, such as
<select><div><select>.

This was recommended here:
whatwg/html#10557 (comment)

This will be tested here:
html5lib/html5lib-tests#178 (comment)

Change-Id: I7a39901582b96edc10deca8794b9ce117aa08229
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6040687
Commit-Queue: Joey Arhar <jarharchromium.org>
Reviewed-by: David Baron <dbaronchromium.org>
Cr-Commit-Position: refs/heads/main{#1391787}

--

wpt-commits: c4c67222e7b0677f92f43280837db48473603f1e
wpt-pr: 49525

UltraBlame original commit: 666bbfe1252e12af9394a702ad62918ae14f44c2
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Dec 13, 2024
…t>, a=testonly

Automatic update from web-platform-tests
Change parser behavior of <select><select>

This patch makes the HTML parser convert nested <select> tags into
</select> instead of </select><select> for backwards compatibility with
the old parser. This behavior will apply to any nested selects, such as
<select><div><select>.

This was recommended here:
whatwg/html#10557 (comment)

This will be tested here:
html5lib/html5lib-tests#178 (comment)

Change-Id: I7a39901582b96edc10deca8794b9ce117aa08229
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6040687
Commit-Queue: Joey Arhar <jarharchromium.org>
Reviewed-by: David Baron <dbaronchromium.org>
Cr-Commit-Position: refs/heads/main{#1391787}

--

wpt-commits: c4c67222e7b0677f92f43280837db48473603f1e
wpt-pr: 49525

UltraBlame original commit: 666bbfe1252e12af9394a702ad62918ae14f44c2
@flavorjones
Copy link

It doesn't look like the pr-preview bot updated the "diff" links after f8d1434 was added to this PR (the links still point to 08bdb25). Would it be possible for someone to nudge the bot to recreate those links? Thank you!

source Show resolved Hide resolved
flavorjones added a commit to sparklemotion/nokogiri that referenced this pull request Dec 19, 2024
whatwg/html#10557

This is green (modulo errors) up to whatwg/html@f8d14341 plus the
correction I suggested to the spec in
whatwg/html#10557 (comment)
flavorjones added a commit to sparklemotion/nokogiri that referenced this pull request Dec 20, 2024
whatwg/html#10557

This is green (modulo errors) up to whatwg/html@f8d14341 plus the
correction I suggested to the spec in
whatwg/html#10557 (comment)
@josepharhar
Copy link
Contributor Author

It doesn't look like the pr-preview bot updated the "diff" links after f8d1434 was added to this PR (the links still point to 08bdb25). Would it be possible for someone to nudge the bot to recreate those links? Thank you!

Looks to me like its up to date now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge yet Pull request must not be merged per rationale in comment normative change topic: parser topic: select The <select> element
Development

Successfully merging this pull request may close these issues.

HTML parser changes for customizable <select>
9 participants