fix flaky consumption-sameorigin.html iframe loading #36304

marcoscaceres · 2022-10-06T03:34:03Z

Tests were flaky in that the iframes could load in different order, leading to different results.

html/user-activation/consumption-sameorigin.html

mustaqahmed · 2022-10-06T14:45:52Z

html/user-activation/consumption-sameorigin.html

        } else if (msg.type.endsWith("-report")) {
            if (--num_children_to_report == 0)
                finishReportPhase();
        }
    });
+    async function createIframes() {


Does this really improve the flakes? I am asking because the block in Line 46-55 already handles the case added here: the child frames pings the top frame when they see load events.

yes. The problem is that the (html) iframes were randomly in different order.

In L85-87, the "load phase" ends after all 3 frames are loaded in any order.

mustaqahmed · 2022-10-06T14:52:56Z

html/user-activation/consumption-sameorigin.html

+        child1.src = "resources/child-one.html";
+        child1.id = "child1";
+        document.body.appendChild(child1);
+        await new Promise((resolve) => (child1.onload = resolve));


Question: Is iframe.onload event really tied to child frame's window.onload? The normative note at the end of this spec section seems to suggest we have to wait explicitly for child frame's window.onload, no?

That would be more correct, yes. However, my testing this on a few hundred runs seems to confirm it's fine (all that is needed here is that the child1 loads first).

You're misreading that note @mustaqahmed. That's just about event propagation. iframe's load event will fire after its inner document load event. (In general I don't recommend reading the UI Events spec for this as it doesn't define dispatching. I'm somewhat surprised the load event is still there.)

Questions for @annevk: Does the dispatching algorithm enforce a "load" event order across a frame boundary (even with a possibly slow cross-origin frame)?

I am asking because a "yes" here means the test could be simpler because it would be sufficient to wait for only the top window.onload (the top window.onload would follow the iframe.onload event which would follow inner window.onload event). Is this correct?

Yeah, that's correct.

@mustaqahmed, can you help me understand what cause the consumption of the user activation here?

I'd still prefer not to rewrite the test if I can avoid it.

However, if I flatten the test to:

async function runTest() { promise_test(async (t) => { const child1 = document.getElementById("child1"); const child1Activation = child1.contentWindow.navigator.userActivation; const child2 = document.getElementById("child2"); const child2Activation = child2.contentWindow.navigator.userActivation; const grandChild = child2.contentDocument.getElementById("grandchild"); const grandChildActivation = grandChild.contentWindow.navigator.userActivation; assert_false(navigator.userActivation.isActive, "Parent frame isActive initial state"); assert_false(navigator.userActivation.hasBeenActive, "Parent frame hasBeenActive initial state"); assert_false(child1Activation.isActive, "Child1 frame isActive initial state"); assert_false(child1Activation.hasBeenActive, "Child1 frame hasBeenActive initial state"); assert_false(child2Activation.isActive, "Child2 frame isActive initial state"); assert_false(child2Activation.hasBeenActive, "Child2 frame hasBeenActive initial state"); assert_false(grandChildActivation.isActive, "Grandchild frame isActive initial state"); assert_false(grandChildActivation.hasBeenActive, "Grandchild frame hasBeenActive initial state"); await test_driver.bless("click", null, child1.contentWindow); assert_true(navigator.userActivation.isActive, "Parent frame isActive after click"); assert_true(navigator.userActivation.hasBeenActive, "Parent frame hasBeenActive after click"); assert_true(child1Activation.isActive, "Child1 frame isActive after click"); assert_true(child1Activation.hasBeenActive, "Child1 frame hasBeenActive after click"); assert_true(child2Activation.isActive, "Child2 frame isActive after click"); assert_true(child2Activation.hasBeenActive, "Child2 frame hasBeenActive after click"); assert_true(grandChildActivation.isActive, "Grandchild frame isActive after click"); assert_true(grandChildActivation.hasBeenActive, "Grandchild frame hasBeenActive after click"); }); }

I get different results? I can't see anything in the iframes that consume the user activation?

I get different results? I can't see anything in the iframes that consume the user activation?

We shouldn't get a different result, because a consumption call affects the whole frame tree.

Could you please elaborate which assertions are different between flattened vs un-flattened frame load code?

Is there a chance test_driver implementation is causing the flake here? This same-origin test is not flaky in Chrome's internal bots, but the cross-origin one is flaky and we can't fix that because of a test_driver impl limitation.

In any case, I think a virtual meeting would be more efficient here to share our thoughts. Let's plan one.

We shouldn't get a different result, because a consumption call affects the whole frame tree.

Oh, wait! But the test again does something non-standard:

window.open().close();

That shouldn't consume the user activation: at least, there is nothing in HTML saying that either window.open() or window.close() consumes it.

The flattened version should be this (all false, click(), all true):

async function runTest() { promise_test(async (t) => { const child1 = document.getElementById("child1"); const child1Activation = child1.contentWindow.navigator.userActivation; const child2 = document.getElementById("child2"); const child2Activation = child2.contentWindow.navigator.userActivation; const grandChild = child2.contentDocument.getElementById("grandchild"); const grandChildActivation = grandChild.contentWindow.navigator.userActivation; assert_false(navigator.userActivation.isActive, "Parent frame isActive initial state"); assert_false(navigator.userActivation.hasBeenActive, "Parent frame hasBeenActive initial state"); assert_false(child1Activation.isActive, "Child1 frame isActive initial state"); assert_false(child1Activation.hasBeenActive, "Child1 frame hasBeenActive initial state"); assert_false(child2Activation.isActive, "Child2 frame isActive initial state"); assert_false(child2Activation.hasBeenActive, "Child2 frame hasBeenActive initial state"); assert_false(grandChildActivation.isActive, "Grandchild frame isActive initial state"); assert_false(grandChildActivation.hasBeenActive, "Grandchild frame hasBeenActive initial state"); await test_driver.bless("click", null, child1.contentWindow); assert_true(navigator.userActivation.isActive, "Parent frame isActive after click"); assert_true(navigator.userActivation.hasBeenActive, "Parent frame hasBeenActive after click"); assert_true(child1Activation.isActive, "Child1 frame isActive after click"); assert_true(child1Activation.hasBeenActive, "Child1 frame hasBeenActive after click"); assert_true(child2Activation.isActive, "Child2 frame isActive after click"); assert_true(child2Activation.hasBeenActive, "Child2 frame hasBeenActive after click"); assert_true(grandChildActivation.isActive, "Grandchild frame isActive after click"); assert_true(grandChildActivation.hasBeenActive, "Grandchild frame hasBeenActive after click"); }); }

In any case, I think a virtual meeting would be more efficient here to share our thoughts. Let's plan one.

Happy to chat, but I think the biggest points of contention are that we don't have a consistent way of consuming the user activation.

So far, the tests have used two non-standard ways to try to consume it:

requestFullscreen() + exitFullscreen()

window.open() + window.close()

We should also avoid window.open() tests. They don't work well on mobile devices.

I'm thinking we should remove or "-tentative" such tests until we have an actual solution to prevent codifying non-standard behavior.

https://dontcallmedom.github.io/webdex/c.html#consume%20user%20activation%40%40html%25%25dfn might help

marcoscaceres · 2022-10-27T02:16:03Z

I kindly request if we can just deal with the flakiness issue, as this is blocking me from landing the API implementation in WebKit.

If the tests do need to be refactored more, let's please do that as a followup as I'm spending more cycles on this than I have available.

I'd also appreciate more help resolving these issues with direct JavaScript code contributions/suggestions as it's really delaying things 🙏

domenic · 2022-10-27T02:30:37Z

I don't think it's good to rush these things. If a test is flaky in one browser and not another, that could indicate a problem with the test, but it could also indicate a problem with the browser in which it's flaky. Quickly changing the test to eliminate the flakiness without investigating the situation more holistically is not a good, as it may hide relevant bugs.

For example, it's possible Safari has an issue here with iframes and load events, or with user activation propagation when multiple iframes are loaded, which we would not want to hide.

domenic · 2022-10-27T02:37:28Z

That said, after looking further it seems like maybe this test is flaky in Chrome too, in which case everything I said above is inapplicable. (Or maybe it's only flaky after the changes in this PR?)

marcoscaceres · 2022-10-27T05:44:12Z

I don't think it's good to rush these things. If a test is flaky in one browser and not another, that could indicate a problem with the test, but it could also indicate a problem with the browser in which it's flaky. Quickly changing the test to eliminate the flakiness without investigating the situation more holistically is not a good, as it may hide relevant bugs.

I agree. I did investigate this one and it's why I proposed the change.

This test was always reporting one iframe winning over the other, which made it flaky.

In WebKit, at least, my change fixes the problem by loading the iframes sequentially. I ran this test a lot (and also why I don't want to go poking at the test's internals).

For example, it's possible Safari has an issue here with iframes and load events, or with user activation propagation when multiple iframes are loaded, which we would not want to hide.

I honestly don't think so. My testing is showing that the two frames can load at different times.

If you take a look you will see that what I mean: there is literally two <iframe>s that are competing to load in parallel, so they are bound to get out of sync.

marcoscaceres · 2022-10-27T05:46:34Z

Having said that, if we can get rid of the message passing from the same origin tests, I really do think we should do that. These tests are extremely difficult to debug on mobile.

mustaqahmed · 2022-10-27T15:59:49Z

Having said that, if we can get rid of the message passing from the same origin tests, I really do think we should do that. These tests are extremely difficult to debug on mobile.

If that helps you, please go ahead. But I still believe frame loading order is not the cause of the flakiness you are seeing because the test waits for all three iframes to load. So I am more interested about which assertions change for you when you flatten, see my comment above.

In any case, I would love to discuss over a VC.

marcoscaceres · 2022-10-28T01:42:40Z

But I still believe frame loading order is not the cause of the flakiness you are seeing because the test waits for all three iframes to load.

yes, but those report in different order. I'm literally seeing the following:

"Child1 frame initial state"
"Child2 frame initial state"

And then occasionally:

"Child2 frame initial state"
"Child1 frame initial state"

In the output. I don't know how much more evidence you need? It's the iframes loading out of order.

This is why I'm adding the iframes sequentially - so the above doesn't happen.

domenic · 2022-10-28T02:24:50Z

Oh! This is the WebKit testrunner bug! Where they can't deal with tests that succeed in a different order!

The WPT project contract does not require that tests all be run in the same order each time. That's a WebKit-specific thing, because they have -expected.txt files which they check against (even for successes!).

This is also why it's so confusing that Marcos keeps using the word "flaky"! "Flaky" generally doesn't mean "tests run in a different order". It means "tests sometimes succeed and sometimes fail". But sometimes people from WebKit get confused and call tests that run in a different order, with the same results each time, "flaky".

In the past we've accepted changes to accomodate this WebKit-ism, as long as they don't change the test very much. (Similar to how WebKit keeps changing tests to not use the WPT server featuers like .sub.html, instead replacing them with get-host-info.js.)

I haven't look at whether the change here is minimal or problematic, but at least now we know what's going on...

marcoscaceres · 2022-10-28T03:29:58Z

Oh! This is the WebKit testrunner bug! Where they can't deal with tests that succeed in a different order!

Yes, sorry that was not clear: I was under the mistaken assumption that Chromium also used -expected.txt files.

The WPT project contract does not require that tests all be run in the same order each time. That's a WebKit-specific thing, because they have -expected.txt files which they check against (even for successes!).

Correct. But it's not harming the test to change it in the manner I proposed.

This is also why it's so confusing that Marcos keeps using the word "flaky"! "Flaky" generally doesn't mean "tests run in a different order".

Sorry, yes. That's marked as flaky on our infrastructure: a test that randomly fails is "flaky" to WebKit folks.

It means "tests sometimes succeed and sometimes fail". But sometimes people from WebKit get confused and call tests that run in a different order, with the same results each time, "flaky".

Correct 😊

In the past we've accepted changes to accomodate this WebKit-ism, as long as they don't change the test very much. (Similar to how WebKit keeps changing tests to not use the WPT server featuers like .sub.html, instead replacing them with get-host-info.js.)

Yes, this is why I'm asking for very small changes. Same with the body not being available yet.

I haven't look at whether the change here is minimal or problematic, but at least now we know what's going on...

Right, but see also the use of window.open().close() as a way to consume the user activation. That's non-standard from my reading of HTML?

(same with requestFullscreen() + exitFullscreen()... if we want to use that as a means of consuming user activation, we should have that standardized - or come up with something better that consumes it explicitly and directly)

mustaqahmed · 2022-10-28T17:34:35Z

Oh! This is the WebKit testrunner bug! Where they can't deal with tests that succeed in a different order!

Yes, sorry that was not clear: I was under the mistaken assumption that Chromium also used -expected.txt files.

The WPT project contract does not require that tests all be run in the same order each time. That's a WebKit-specific thing, because they have -expected.txt files which they check against (even for successes!).

Correct. But it's not harming the test to change it in the manner I proposed.

We are on the same page on this, that's why I commented to "go ahead". But the source of the flake was a mystery to me, and I wanted to see a solution that works for the cross-origin case too. I am seeing the full picture now from the two most recent comments: in WebKit the asserts are not failing but the order of the test() calls are assumed to be fixed---is this correct?

This is also why it's so confusing that Marcos keeps using the word "flaky"! "Flaky" generally doesn't mean "tests run in a different order".

Sorry, yes. That's marked as flaky on our infrastructure: a test that randomly fails is "flaky" to WebKit folks.

It means "tests sometimes succeed and sometimes fail". But sometimes people from WebKit get confused and call tests that run in a different order, with the same results each time, "flaky".

Correct blush

In the past we've accepted changes to accomodate this WebKit-ism, as long as they don't change the test very much. (Similar to how WebKit keeps changing tests to not use the WPT server featuers like .sub.html, instead replacing them with get-host-info.js.)

Yes, this is why I'm asking for very small changes. Same with the body not being available yet.

Assuming my last comment here is correct, let me suggest an alternative change that would work even for the cross-origin case: move all "initial state" test() calls to finishLoadPhase(), which would now rely on the msg values saved in global states, like:

  if (msg.type == 'child-one-loaded') child1_initial_msg = msg

etc. Similarly, move all "final state" test() calls to finishReportPhase(). That should work, right?

I still hope (like @domenic) that WebKit's test order rigidness would be fixed one day, but let's move forward here.

I haven't look at whether the change here is minimal or problematic, but at least now we know what's going on...

Right, but see also the use of window.open().close() as a way to consume the user activation. That's non-standard from my reading of HTML?

(same with requestFullscreen() + exitFullscreen()... if we want to use that as a means of consuming user activation, we should have that standardized - or come up with something better that consumes it explicitly and directly)

Without a standardized solution here, I had to add stop-gap measures like these 😢 because we can't possibly standardize any user-activation gated APIs (including their consumption behavior) before standardizing the core user activation model itself! We needed to start standardizing the chicken or the egg. If you see a better way (including making it conditional based on the user agent), let's discuss it in a separate issue.

We (in Chrome) know how frustrating and time-consuming every minor failure in user activation tests might feel---we have gone through this! Let's continue our frank discussion here and elsewhere so that we can avoid revisting the same pain points!

marcoscaceres · 2022-10-29T01:05:52Z

in WebKit the asserts are not failing but the order of the test() calls are assumed to be fixed---is this correct?

That is correct. Sorry again I didn't communicate what was happening clearly. My bad.

I still hope (like @domenic) that WebKit's test order rigidness would be fixed one day,

Me too. Though, at the same time, I can appreciate the reproducibility that webkit's CI requires (i.e., if you squint just right: "this is a feature, not a bug").

Without a standardized solution here, I had to add stop-gap measures like these 😢

No, I totally get it and I'm not blaming anyone (I've done some questionable things myself in the Payment Request tests to consume user activation 🙈). At the same time, for someone coming at this fresh working on the implementation I was surprised to find the stop gaps because they don't match anything in the specs.

But we probably shouldn't do that as a lot of engineers "code to the tests". This is a bit dangerous, because the specs are codifying non-standard behavior.

Where we need stopgaps, at a minimum, we should document them in the code... or, better, we should stop and do as you suggest below (work together to standardize something).

because we can't possibly standardize any user-activation gated APIs (including their consumption behavior) before standardizing the core user activation model itself!

Absolutely. But we have the model, we just need a way of accessing it.

We needed to start standardizing the chicken or the egg. If you see a better way (including making it conditional based on the user agent), let's discuss it in a separate issue.

Ok, but can we acknowledge that these tests should be -tentative?

Filed: #36727

Let's chat!

marcoscaceres · 2022-10-31T01:14:46Z

@mustaqahmed or @domenic, would you mind approving as this addresses the (webkit) flakiness and also addresses missing awaits?

We can then continue the discussion about how to consume user activation in #36727

mustaqahmed · 2022-11-02T19:17:35Z

html/user-activation/consumption-sameorigin.html

+        childSO.id = "child-so";
+        childSO.src = "resources/consumption-sameorigin-child.html";
+        document.body.appendChild(childSO);
+        await new Promise((resolve) => (childSO.onload = resolve));


Looks like we are still relying on the load event behavior @annevk mentioned above...I mean the fact that childSO's load is ultimately blocked by "grandchild" frame's load, right? I find this mix of flattening-vs-not a bit too awkward!

Removed this line. There is no need to await for it to load. What matters is just that the first iframe loads in order.

I've made similar changes to all the other dual iframe tests. I'll have them re-viewed upstream as part of my WebKit patch and will just send them reviewed as part of that.

html/user-activation/consumption-sameorigin.html

marcoscaceres · 2022-11-02T20:52:36Z

Closing. Will send all the WebKit reviewed changes in one go.

fix flaky consumption-sameorigin.html iframe loading

40bd4c2

marcoscaceres requested a review from mustaqahmed October 6, 2022 03:34

wpt-pr-bot added the html label Oct 6, 2022

wpt-pr-bot assigned foolip Oct 6, 2022

wpt-pr-bot requested review from annevk, domenic, foolip, jdm, jgraham and zqzhang October 6, 2022 03:34

marcoscaceres commented Oct 6, 2022

View reviewed changes

html/user-activation/consumption-sameorigin.html Outdated Show resolved Hide resolved

Update html/user-activation/consumption-sameorigin.html

85ee464

marcoscaceres mentioned this pull request Oct 6, 2022

user activation tests might be racy #36221

Closed

mustaqahmed reviewed Oct 6, 2022

View reviewed changes

marcoscaceres requested a review from mustaqahmed October 26, 2022 05:38

marcoscaceres enabled auto-merge (squash) October 26, 2022 05:38

mustaqahmed reviewed Nov 2, 2022

View reviewed changes

marcoscaceres commented Nov 2, 2022

View reviewed changes

html/user-activation/consumption-sameorigin.html Outdated Show resolved Hide resolved

Update html/user-activation/consumption-sameorigin.html

a6845fa

marcoscaceres closed this Nov 2, 2022

auto-merge was automatically disabled November 2, 2022 20:52
Pull request was closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix flaky consumption-sameorigin.html iframe loading #36304

fix flaky consumption-sameorigin.html iframe loading #36304

marcoscaceres commented Oct 6, 2022

mustaqahmed Oct 6, 2022

marcoscaceres Oct 6, 2022

mustaqahmed Oct 11, 2022

mustaqahmed Oct 6, 2022

marcoscaceres Oct 6, 2022

annevk Oct 7, 2022 •

edited

Loading

mustaqahmed Oct 11, 2022

annevk Oct 11, 2022

marcoscaceres Oct 27, 2022 •

edited

Loading

mustaqahmed Oct 27, 2022

marcoscaceres Oct 28, 2022 •

edited

Loading

marcoscaceres Oct 28, 2022

domenic Oct 28, 2022

marcoscaceres commented Oct 27, 2022 •

edited

Loading

domenic commented Oct 27, 2022

domenic commented Oct 27, 2022 •

edited

Loading

marcoscaceres commented Oct 27, 2022

marcoscaceres commented Oct 27, 2022

mustaqahmed commented Oct 27, 2022

marcoscaceres commented Oct 28, 2022

domenic commented Oct 28, 2022

marcoscaceres commented Oct 28, 2022 •

edited

Loading

mustaqahmed commented Oct 28, 2022

marcoscaceres commented Oct 29, 2022

marcoscaceres commented Oct 31, 2022 •

edited

Loading

mustaqahmed Nov 2, 2022

marcoscaceres Nov 2, 2022

marcoscaceres commented Nov 2, 2022

fix flaky consumption-sameorigin.html iframe loading #36304

fix flaky consumption-sameorigin.html iframe loading #36304

Conversation

marcoscaceres commented Oct 6, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annevk Oct 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoscaceres Oct 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoscaceres Oct 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoscaceres commented Oct 27, 2022 • edited Loading

domenic commented Oct 27, 2022

domenic commented Oct 27, 2022 • edited Loading

marcoscaceres commented Oct 27, 2022

marcoscaceres commented Oct 27, 2022

mustaqahmed commented Oct 27, 2022

marcoscaceres commented Oct 28, 2022

domenic commented Oct 28, 2022

marcoscaceres commented Oct 28, 2022 • edited Loading

mustaqahmed commented Oct 28, 2022

marcoscaceres commented Oct 29, 2022

marcoscaceres commented Oct 31, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoscaceres commented Nov 2, 2022

annevk Oct 7, 2022 •

edited

Loading

marcoscaceres Oct 27, 2022 •

edited

Loading

marcoscaceres Oct 28, 2022 •

edited

Loading

marcoscaceres commented Oct 27, 2022 •

edited

Loading

domenic commented Oct 27, 2022 •

edited

Loading

marcoscaceres commented Oct 28, 2022 •

edited

Loading

marcoscaceres commented Oct 31, 2022 •

edited

Loading