Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix flaky consumption-sameorigin.html iframe loading #36304

Closed
wants to merge 3 commits into from

Conversation

marcoscaceres
Copy link
Contributor

Tests were flaky in that the iframes could load in different order, leading to different results.

} else if (msg.type.endsWith("-report")) {
if (--num_children_to_report == 0)
finishReportPhase();
}
});
async function createIframes() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really improve the flakes? I am asking because the block in Line 46-55 already handles the case added here: the child frames pings the top frame when they see load events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. The problem is that the (html) iframes were randomly in different order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In L85-87, the "load phase" ends after all 3 frames are loaded in any order.

child1.src = "resources/child-one.html";
child1.id = "child1";
document.body.appendChild(child1);
await new Promise((resolve) => (child1.onload = resolve));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is iframe.onload event really tied to child frame's window.onload? The normative note at the end of this spec section seems to suggest we have to wait explicitly for child frame's window.onload, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be more correct, yes. However, my testing this on a few hundred runs seems to confirm it's fine (all that is needed here is that the child1 loads first).

Copy link
Member

@annevk annevk Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're misreading that note @mustaqahmed. That's just about event propagation. iframe's load event will fire after its inner document load event. (In general I don't recommend reading the UI Events spec for this as it doesn't define dispatching. I'm somewhat surprised the load event is still there.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions for @annevk: Does the dispatching algorithm enforce a "load" event order across a frame boundary (even with a possibly slow cross-origin frame)?

I am asking because a "yes" here means the test could be simpler because it would be sufficient to wait for only the top window.onload (the top window.onload would follow the iframe.onload event which would follow inner window.onload event). Is this correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's correct.

Copy link
Contributor Author

@marcoscaceres marcoscaceres Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mustaqahmed, can you help me understand what cause the consumption of the user activation here?

I'd still prefer not to rewrite the test if I can avoid it.

However, if I flatten the test to:

async function runTest() {
  promise_test(async (t) => {
      const child1 = document.getElementById("child1");
      const child1Activation = child1.contentWindow.navigator.userActivation;
      const child2 = document.getElementById("child2");
      const child2Activation = child2.contentWindow.navigator.userActivation;
      const grandChild = child2.contentDocument.getElementById("grandchild");
      const grandChildActivation = grandChild.contentWindow.navigator.userActivation;
      
      assert_false(navigator.userActivation.isActive, "Parent frame isActive initial state");
      assert_false(navigator.userActivation.hasBeenActive, "Parent frame hasBeenActive initial state");
      assert_false(child1Activation.isActive, "Child1 frame isActive initial state");
      assert_false(child1Activation.hasBeenActive, "Child1 frame hasBeenActive initial state");
      assert_false(child2Activation.isActive, "Child2 frame isActive initial state");
      assert_false(child2Activation.hasBeenActive, "Child2 frame hasBeenActive initial state");
      assert_false(grandChildActivation.isActive, "Grandchild frame isActive initial state");
      assert_false(grandChildActivation.hasBeenActive, "Grandchild frame hasBeenActive initial state");
      
      await test_driver.bless("click", null, child1.contentWindow);

      assert_true(navigator.userActivation.isActive, "Parent frame isActive after click");
      assert_true(navigator.userActivation.hasBeenActive, "Parent frame hasBeenActive after click");
      assert_true(child1Activation.isActive, "Child1 frame isActive after click");
      assert_true(child1Activation.hasBeenActive, "Child1 frame hasBeenActive after click");
      assert_true(child2Activation.isActive, "Child2 frame isActive after click");
      assert_true(child2Activation.hasBeenActive, "Child2 frame hasBeenActive after click");
      assert_true(grandChildActivation.isActive, "Grandchild frame isActive after click");
      assert_true(grandChildActivation.hasBeenActive, "Grandchild frame hasBeenActive after click");
  });
}

I get different results? I can't see anything in the iframes that consume the user activation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get different results? I can't see anything in the iframes that consume the user activation?

We shouldn't get a different result, because a consumption call affects the whole frame tree.

Could you please elaborate which assertions are different between flattened vs un-flattened frame load code?

Is there a chance test_driver implementation is causing the flake here? This same-origin test is not flaky in Chrome's internal bots, but the cross-origin one is flaky and we can't fix that because of a test_driver impl limitation.

In any case, I think a virtual meeting would be more efficient here to share our thoughts. Let's plan one.

Copy link
Contributor Author

@marcoscaceres marcoscaceres Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't get a different result, because a consumption call affects the whole frame tree.

Oh, wait! But the test again does something non-standard:

window.open().close();

That shouldn't consume the user activation: at least, there is nothing in HTML saying that either window.open() or window.close() consumes it.

The flattened version should be this (all false, click(), all true):

async function runTest() {
  promise_test(async (t) => {
      const child1 = document.getElementById("child1");
      const child1Activation = child1.contentWindow.navigator.userActivation;
      const child2 = document.getElementById("child2");
      const child2Activation = child2.contentWindow.navigator.userActivation;
      const grandChild = child2.contentDocument.getElementById("grandchild");
      const grandChildActivation = grandChild.contentWindow.navigator.userActivation;
      
      assert_false(navigator.userActivation.isActive, "Parent frame isActive initial state");
      assert_false(navigator.userActivation.hasBeenActive, "Parent frame hasBeenActive initial state");
      assert_false(child1Activation.isActive, "Child1 frame isActive initial state");
      assert_false(child1Activation.hasBeenActive, "Child1 frame hasBeenActive initial state");
      assert_false(child2Activation.isActive, "Child2 frame isActive initial state");
      assert_false(child2Activation.hasBeenActive, "Child2 frame hasBeenActive initial state");
      assert_false(grandChildActivation.isActive, "Grandchild frame isActive initial state");
      assert_false(grandChildActivation.hasBeenActive, "Grandchild frame hasBeenActive initial state");
      
      await test_driver.bless("click", null, child1.contentWindow);

      assert_true(navigator.userActivation.isActive, "Parent frame isActive after click");
      assert_true(navigator.userActivation.hasBeenActive, "Parent frame hasBeenActive after click");
      assert_true(child1Activation.isActive, "Child1 frame isActive after click");
      assert_true(child1Activation.hasBeenActive, "Child1 frame hasBeenActive after click");
      assert_true(child2Activation.isActive, "Child2 frame isActive after click");
      assert_true(child2Activation.hasBeenActive, "Child2 frame hasBeenActive after click");
      assert_true(grandChildActivation.isActive, "Grandchild frame isActive after click");
      assert_true(grandChildActivation.hasBeenActive, "Grandchild frame hasBeenActive after click");
  });
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I think a virtual meeting would be more efficient here to share our thoughts. Let's plan one.

Happy to chat, but I think the biggest points of contention are that we don't have a consistent way of consuming the user activation.

So far, the tests have used two non-standard ways to try to consume it:

  1. requestFullscreen() + exitFullscreen()
  2. window.open() + window.close()

We should also avoid window.open() tests. They don't work well on mobile devices.

I'm thinking we should remove or "-tentative" such tests until we have an actual solution to prevent codifying non-standard behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoscaceres marcoscaceres enabled auto-merge (squash) October 26, 2022 05:38
@marcoscaceres
Copy link
Contributor Author

marcoscaceres commented Oct 27, 2022

I kindly request if we can just deal with the flakiness issue, as this is blocking me from landing the API implementation in WebKit.

If the tests do need to be refactored more, let's please do that as a followup as I'm spending more cycles on this than I have available.

I'd also appreciate more help resolving these issues with direct JavaScript code contributions/suggestions as it's really delaying things 🙏

@domenic
Copy link
Member

domenic commented Oct 27, 2022

I don't think it's good to rush these things. If a test is flaky in one browser and not another, that could indicate a problem with the test, but it could also indicate a problem with the browser in which it's flaky. Quickly changing the test to eliminate the flakiness without investigating the situation more holistically is not a good, as it may hide relevant bugs.

For example, it's possible Safari has an issue here with iframes and load events, or with user activation propagation when multiple iframes are loaded, which we would not want to hide.

@domenic
Copy link
Member

domenic commented Oct 27, 2022

That said, after looking further it seems like maybe this test is flaky in Chrome too, in which case everything I said above is inapplicable. (Or maybe it's only flaky after the changes in this PR?)

@marcoscaceres
Copy link
Contributor Author

I don't think it's good to rush these things. If a test is flaky in one browser and not another, that could indicate a problem with the test, but it could also indicate a problem with the browser in which it's flaky. Quickly changing the test to eliminate the flakiness without investigating the situation more holistically is not a good, as it may hide relevant bugs.

I agree. I did investigate this one and it's why I proposed the change.

This test was always reporting one iframe winning over the other, which made it flaky.

In WebKit, at least, my change fixes the problem by loading the iframes sequentially. I ran this test a lot (and also why I don't want to go poking at the test's internals).

For example, it's possible Safari has an issue here with iframes and load events, or with user activation propagation when multiple iframes are loaded, which we would not want to hide.

I honestly don't think so. My testing is showing that the two frames can load at different times.

If you take a look you will see that what I mean: there is literally two <iframe>s that are competing to load in parallel, so they are bound to get out of sync.

@marcoscaceres
Copy link
Contributor Author

Having said that, if we can get rid of the message passing from the same origin tests, I really do think we should do that. These tests are extremely difficult to debug on mobile.

@mustaqahmed
Copy link
Member

Having said that, if we can get rid of the message passing from the same origin tests, I really do think we should do that. These tests are extremely difficult to debug on mobile.

If that helps you, please go ahead. But I still believe frame loading order is not the cause of the flakiness you are seeing because the test waits for all three iframes to load. So I am more interested about which assertions change for you when you flatten, see my comment above.

In any case, I would love to discuss over a VC.

@marcoscaceres
Copy link
Contributor Author

But I still believe frame loading order is not the cause of the flakiness you are seeing because the test waits for all three iframes to load.

yes, but those report in different order. I'm literally seeing the following:

"Child1 frame initial state"
"Child2 frame initial state"

And then occasionally:

"Child2 frame initial state"
"Child1 frame initial state"

In the output. I don't know how much more evidence you need? It's the iframes loading out of order.

This is why I'm adding the iframes sequentially - so the above doesn't happen.

@domenic
Copy link
Member

domenic commented Oct 28, 2022

Oh! This is the WebKit testrunner bug! Where they can't deal with tests that succeed in a different order!

The WPT project contract does not require that tests all be run in the same order each time. That's a WebKit-specific thing, because they have -expected.txt files which they check against (even for successes!).

This is also why it's so confusing that Marcos keeps using the word "flaky"! "Flaky" generally doesn't mean "tests run in a different order". It means "tests sometimes succeed and sometimes fail". But sometimes people from WebKit get confused and call tests that run in a different order, with the same results each time, "flaky".

In the past we've accepted changes to accomodate this WebKit-ism, as long as they don't change the test very much. (Similar to how WebKit keeps changing tests to not use the WPT server featuers like .sub.html, instead replacing them with get-host-info.js.)

I haven't look at whether the change here is minimal or problematic, but at least now we know what's going on...

@marcoscaceres
Copy link
Contributor Author

marcoscaceres commented Oct 28, 2022

Oh! This is the WebKit testrunner bug! Where they can't deal with tests that succeed in a different order!

Yes, sorry that was not clear: I was under the mistaken assumption that Chromium also used -expected.txt files.

The WPT project contract does not require that tests all be run in the same order each time. That's a WebKit-specific thing, because they have -expected.txt files which they check against (even for successes!).

Correct. But it's not harming the test to change it in the manner I proposed.

This is also why it's so confusing that Marcos keeps using the word "flaky"! "Flaky" generally doesn't mean "tests run in a different order".

Sorry, yes. That's marked as flaky on our infrastructure: a test that randomly fails is "flaky" to WebKit folks.

It means "tests sometimes succeed and sometimes fail". But sometimes people from WebKit get confused and call tests that run in a different order, with the same results each time, "flaky".

Correct 😊

In the past we've accepted changes to accomodate this WebKit-ism, as long as they don't change the test very much. (Similar to how WebKit keeps changing tests to not use the WPT server featuers like .sub.html, instead replacing them with get-host-info.js.)

Yes, this is why I'm asking for very small changes. Same with the body not being available yet.

I haven't look at whether the change here is minimal or problematic, but at least now we know what's going on...

Right, but see also the use of window.open().close() as a way to consume the user activation. That's non-standard from my reading of HTML?

(same with requestFullscreen() + exitFullscreen()... if we want to use that as a means of consuming user activation, we should have that standardized - or come up with something better that consumes it explicitly and directly)

@mustaqahmed
Copy link
Member

Oh! This is the WebKit testrunner bug! Where they can't deal with tests that succeed in a different order!

Yes, sorry that was not clear: I was under the mistaken assumption that Chromium also used -expected.txt files.

The WPT project contract does not require that tests all be run in the same order each time. That's a WebKit-specific thing, because they have -expected.txt files which they check against (even for successes!).

Correct. But it's not harming the test to change it in the manner I proposed.

We are on the same page on this, that's why I commented to "go ahead". But the source of the flake was a mystery to me, and I wanted to see a solution that works for the cross-origin case too. I am seeing the full picture now from the two most recent comments: in WebKit the asserts are not failing but the order of the test() calls are assumed to be fixed---is this correct?

This is also why it's so confusing that Marcos keeps using the word "flaky"! "Flaky" generally doesn't mean "tests run in a different order".

Sorry, yes. That's marked as flaky on our infrastructure: a test that randomly fails is "flaky" to WebKit folks.

It means "tests sometimes succeed and sometimes fail". But sometimes people from WebKit get confused and call tests that run in a different order, with the same results each time, "flaky".

Correct blush

In the past we've accepted changes to accomodate this WebKit-ism, as long as they don't change the test very much. (Similar to how WebKit keeps changing tests to not use the WPT server featuers like .sub.html, instead replacing them with get-host-info.js.)

Yes, this is why I'm asking for very small changes. Same with the body not being available yet.

Assuming my last comment here is correct, let me suggest an alternative change that would work even for the cross-origin case: move all "initial state" test() calls to finishLoadPhase(), which would now rely on the msg values saved in global states, like:

  if (msg.type == 'child-one-loaded') child1_initial_msg = msg

etc. Similarly, move all "final state" test() calls to finishReportPhase(). That should work, right?

I still hope (like @domenic) that WebKit's test order rigidness would be fixed one day, but let's move forward here.

I haven't look at whether the change here is minimal or problematic, but at least now we know what's going on...

Right, but see also the use of window.open().close() as a way to consume the user activation. That's non-standard from my reading of HTML?

(same with requestFullscreen() + exitFullscreen()... if we want to use that as a means of consuming user activation, we should have that standardized - or come up with something better that consumes it explicitly and directly)

Without a standardized solution here, I had to add stop-gap measures like these 😢 because we can't possibly standardize any user-activation gated APIs (including their consumption behavior) before standardizing the core user activation model itself! We needed to start standardizing the chicken or the egg. If you see a better way (including making it conditional based on the user agent), let's discuss it in a separate issue.

We (in Chrome) know how frustrating and time-consuming every minor failure in user activation tests might feel---we have gone through this! Let's continue our frank discussion here and elsewhere so that we can avoid revisting the same pain points!

@marcoscaceres
Copy link
Contributor Author

in WebKit the asserts are not failing but the order of the test() calls are assumed to be fixed---is this correct?

That is correct. Sorry again I didn't communicate what was happening clearly. My bad.

I still hope (like @domenic) that WebKit's test order rigidness would be fixed one day,

Me too. Though, at the same time, I can appreciate the reproducibility that webkit's CI requires (i.e., if you squint just right: "this is a feature, not a bug").

Without a standardized solution here, I had to add stop-gap measures like these 😢

No, I totally get it and I'm not blaming anyone (I've done some questionable things myself in the Payment Request tests to consume user activation 🙈). At the same time, for someone coming at this fresh working on the implementation I was surprised to find the stop gaps because they don't match anything in the specs.

But we probably shouldn't do that as a lot of engineers "code to the tests". This is a bit dangerous, because the specs are codifying non-standard behavior.

Where we need stopgaps, at a minimum, we should document them in the code... or, better, we should stop and do as you suggest below (work together to standardize something).

because we can't possibly standardize any user-activation gated APIs (including their consumption behavior) before standardizing the core user activation model itself!

Absolutely. But we have the model, we just need a way of accessing it.

We needed to start standardizing the chicken or the egg. If you see a better way (including making it conditional based on the user agent), let's discuss it in a separate issue.

Ok, but can we acknowledge that these tests should be -tentative?

Filed: #36727

Let's chat!

@marcoscaceres
Copy link
Contributor Author

marcoscaceres commented Oct 31, 2022

@mustaqahmed or @domenic, would you mind approving as this addresses the (webkit) flakiness and also addresses missing awaits?

We can then continue the discussion about how to consume user activation in #36727

childSO.id = "child-so";
childSO.src = "resources/consumption-sameorigin-child.html";
document.body.appendChild(childSO);
await new Promise((resolve) => (childSO.onload = resolve));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are still relying on the load event behavior @annevk mentioned above...I mean the fact that childSO's load is ultimately blocked by "grandchild" frame's load, right? I find this mix of flattening-vs-not a bit too awkward!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this line. There is no need to await for it to load. What matters is just that the first iframe loads in order.

I've made similar changes to all the other dual iframe tests. I'll have them re-viewed upstream as part of my WebKit patch and will just send them reviewed as part of that.

@marcoscaceres
Copy link
Contributor Author

Closing. Will send all the WebKit reviewed changes in one go.

auto-merge was automatically disabled November 2, 2022 20:52

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants