Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use total tests expected for Interop score aggregation #242

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DanielRyanSmith
Copy link
Contributor

@DanielRyanSmith DanielRyanSmith commented Feb 4, 2025

This change fixes an issue in which past Interop scores (displayed in the graph on the dashboard) will sometimes be inaccurate. The Interop scores were being calculated using the total number of labeled tests found within the set of runs, rather than the number of labeled tests that are expected to exist. This means that the Interop scores were essentially being calculated as the value of the Interop score during that run's date, rather than what that Interop scores should be today.

This bug only affects focus areas that have had their labeled tests change during the Interop year, and also does not affect the calculations for past browser scores, which is why the Interop score was occasionally displaying as higher than individual browser scores.

Within the scoring script is an explanation of why we want to aggregate the scores this way:

      // We always normalize against the number of tests we are looking for,
      // rather than the total number of tests we found. The trade-off is all
      // about new tests being added to the set.
      //
      // If a large chunk of tests are introduced at date X, and they fail in
      // some browser, then runs after date X look worse if you're only
      // counting total tests found - even though the tests would have failed
      // before date X as well.
      //
      // Conversely, if a large chunk of tests are introduced at date X, and
      // they pass in some browser, then runs after date X would get an
      // artificial boost in pass-rate due to this - even if the tests would
      // have passed before date X as well.
      //
      // We consider the former case worse than the latter, so optimize for it
      // by always comparing against the full test list. This does mean that
      // when tests are added to the set, previously generated data is no
      // longer valid and this script should be re-run for all dates.

Graph views after change

nesting

IndexedDB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant