Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user activation tests might be racy #36221

Closed
marcoscaceres opened this issue Oct 3, 2022 · 16 comments
Closed

user activation tests might be racy #36221

marcoscaceres opened this issue Oct 3, 2022 · 16 comments
Assignees

Comments

@marcoscaceres
Copy link
Contributor

@mustaqahmed, I noticed that some tests in #35836 are missing await on calls to test_driver.click(). I want to confirm to it won't be an issue?

I think we should add await to those tests regardless... if those tests get updated, that might be overlooked leading to problems.

@mustaqahmed
Copy link
Member

Most of these tests respond to test_driver.click() action through the click event received in JS, so they wait for the click to happen anyways. That said, just now I looked into the click flow in these tests and felt like using await would make the test more readable in some cases (e.g. chained-setTimeout.tentative.html).

As for racy behavior, in Chrome the tests with sub-frames (those containing "sameorigin" and "crossorigin" in the names) are marked as flaky because of possible delays in loading hit-test data for a subframe. I suspect Test Driver can't help here because it should dispatch the click to whichever element is "ready", but I could be wrong here.

@marcoscaceres
Copy link
Contributor Author

The chained-setTimeout test is also flaky on the WebKit side... so I'm wondering if we really need that test? or can we refactor it in some other way so it's not setTimeout dependent?

The transient activation duration is UA-determined, so I'm wondering if it's worth testing that at all?

@mustaqahmed
Copy link
Member

This test is about setTimeout chaining, and the total delay tested here is small (30ms = 3 call depths of 10ms each). Can we test if this delay is the root cause of the flake? I would check if the flakiness "score" gets better if the delay is reduced to, say, 3*1ms.

Also, please let me know if the flakiness happens at an assert before or after the click, to determine if event targeting could be a probable cause.

@marcoscaceres
Copy link
Contributor Author

The "consumption-sameorigin.html" appears to be super flaky too. Here are two test runs (repeat 20 times):

Running 1 WebKitTestRunner.

[2/20] imported/w3c/web-platform-tests/html/user-activation/consumption-sameorigin.html failed unexpectedly (text diff)
[7/20] imported/w3c/web-platform-tests/html/user-activation/consumption-sameorigin.html failed unexpectedly (text diff)
[10/20] imported/w3c/web-platform-tests/html/user-activation/consumption-sameorigin.html failed unexpectedly (text diff)
[14/20] imported/w3c/web-platform-tests/html/user-activation/consumption-sameorigin.html failed unexpectedly (text diff)
[16/20] imported/w3c/web-platform-tests/html/user-activation/consumption-sameorigin.html failed unexpectedly (text diff)

Running 1 WebKitTestRunner.
[1/20] imported/w3c/web-platform-tests/html/user-activation/consumption-sameorigin.html failed unexpectedly (text diff)
[7/20] imported/w3c/web-platform-tests/html/user-activation/consumption-sameorigin.html failed unexpectedly (text diff)
[10/20] imported/w3c/web-platform-tests/html/user-activation/consumption-sameorigin.html failed unexpectedly (text diff)
[13/20] imported/w3c/web-platform-tests/html/user-activation/consumption-sameorigin.html failed unexpectedly (text diff)
[16/20] imported/w3c/web-platform-tests/html/user-activation/consumption-sameorigin.html failed unexpectedly (text diff)
[18/20] imported/w3c/web-platform-tests/html/user-activation/consumption-sameorigin.html failed unexpectedly (text diff)
[19/20] imported/w3c/web-platform-tests/html/user-activation/consumption-sameorigin.html failed unexpectedly (text diff)

I think loading both iframes is going to lead to the above. They should be loaded in order or in some more predictable manner.

@marcoscaceres
Copy link
Contributor Author

Sent #36304 to fix "consumption-sameorigin.html"

@marcoscaceres
Copy link
Contributor Author

Also, please let me know if the flakiness happens at an assert before or after the click, to determine if event targeting could be a probable cause.

They are just timing out and not finishing to run. I'll need to investigate further where they are getting stuck.

@marcoscaceres
Copy link
Contributor Author

marcoscaceres commented Oct 6, 2022

This test is about setTimeout chaining, and the total delay tested here is small (30ms = 3 call depths of 10ms each). Can we test if this delay is the root cause of the flake? I would check if the flakiness "score" gets better if the delay is reduced to, say, 3*1ms.

I still don't think this test is correct or very useful, to be honest... setTimeout is for a at least N ms, but that's not guaranteed to be exactly N (i.e., it could be a lot more, depending on testing infrastructure, virtualization, etc.). So, while transient activation should last some amount of time, it's up to the user agent to decide how much time that is (for testing purposes, it could be 0ms or it could be seconds, but that's totally left up to the user agent).

These tests assume that it will be at least 30ms (it's not a bad assumption!), but it's not a correct assumption either as it's left to the user agent.

@mustaqahmed
Copy link
Member

The test is about chaining setTimeout calls (one setTimeout calls another recursively) because this particular situation didn't work in many browsers I tested few years ago (including Chrome before it shipped UAv2). This is a real case we need to test, so avoiding the chaining of setTimeout calls (as suggested in the PR) is not an option, sorry!

The exact timing is not important so I suggested reducing the delay to 1ms*3 (I would avoid 0ms*3 in case UAs have special handling for this case). I agree that the user agent can choose to arbitrarily extend any time-limit but it sounds like a very extreme case if main thread unresponsiveness causes the 30ms or 3ms delay getting extended to 5 seconds---I believe a lot of tests would fail in that case.

They are just timing out and not finishing to run. I'll need to investigate further where they are getting stuck.

Finding out where the test JS gets stuck would be helpful. Please keep me posted.

@marcoscaceres
Copy link
Contributor Author

This is a real case we need to test, so avoiding the chaining of setTimeout calls (as suggested in the PR) is not an option, sorry!

No problem. I'm just trying to understand.

I'm still not following how chaining/recursive calls to setTimeout() would affect transient activation? My understanding from the code (and the refactor) is that the tests are just checking:

  1. before click + some time N: does it have transient activation? (yes/no)
  2. .click()
  3. After click + some time N: does it have transient activation? (yes/no) (3 times)

What am I missing?

@marcoscaceres
Copy link
Contributor Author

marcoscaceres commented Oct 11, 2022

@mustaqahmed about the general issue with these tests, the problem (for webkit at least) is that the body doesn't appear to be ready before the tests run. This appears to be problem because the <body><script> pattern these tests are using.

A general mitigation strategy that's worked for me so far has been to run them onload. However, I just realized (but haven't tested) that adding defer on the scripts might also make them work.

For a bit more context, we are implementing UserActivation in WebKit, which is why were are interested in all these tests right now.

@mustaqahmed
Copy link
Member

I'm still not following how chaining/recursive calls to setTimeout() would affect transient activation?

You are perfectly on track: this shouldn't affect any activation states because UAv2 defines them simply as a state of the Window. Before UAv2 this was not the case, and the browsers had to pass a "token" (as per the old spec) across setTimeout calls or Promise chains in a limited manner to prevent possible token replication. This test will catch that particular limitation of the old model with setTimeout. (And we still need a similar test for Promise chains!)

... the body doesn't appear to be ready before the tests run.

In Chrome, we have the same challenge in subframes but the top-frame seems fine with just <body onload=...>. Does adding one or two requestAnimationFrame() calls after onload=... help?

For a bit more context, we are implementing UserActivation in WebKit, which is why were are interested in all these tests right now.

We sensed that from your first comment here, and we are looking forward to a solid interop story here 😃! Thanks for cc-ing me in discussions that may benefit from our past impl experience.

@marcoscaceres
Copy link
Contributor Author

Ah, ok. Now it makes sense! Thank you! 🙏

When implementing (and having not been part of the UAv1 discussions), I was really confused when I saw this test as it didn't match anything in the spec.

I totally get that we might want to have regression testing in WPT, but I wonder if we should move things like this to something like a legacy/ folder instead?

I know this test doesn't break anything, but it was quite confusing to try work out what it was doing and why.

@mustaqahmed
Copy link
Member

I am a bit reserved about a "legacy" folder testing current expectations. We need to avoid any misinterpretation that these are "old behavior" or "failures here are not that important".

The fact that this test (and similar ones based on Promises etc---which I am hoping to add one day 🤞🏼) catches a non-compliant implementation (like pre-UAv2 Chrome), does a folder named "impl-compliance" make more sense? I know this is not a great suggestion 😞.

@marcoscaceres
Copy link
Contributor Author

I've seen "legacy-" used a lot in various filenames. See:

find . -name "*legacy-*"

Maybe we can just prefix them with that?

@mustaqahmed
Copy link
Member

Thanks for the suggestion to check existing "legacy" tests. I just sampled uievents/legacy* folders, and their change-logs seem to suggest they are old tests (I checked this and this). Most likely because of this, these folders doesn't appear in wpt.fyi dashboard.

The tests we are discussing here are asserting behaviors that are unsupported by the old user activation spec. So "legacy" in the name would be misleading IMHO.

@marcoscaceres
Copy link
Contributor Author

marcoscaceres commented Nov 23, 2022

Closing, as we addressed the missing awaits. The other problem was WebKit's infrastructure not supporting hosts[www] and friends, but only hosts[alt] and just {host}.

Thanks again for your help and time @mustaqahmed! Really appreciate your patience while I (continue to) get up to speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants