Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

Add validate-consistency script to list contradictions between results and BCD #875

Closed
foolip opened this issue Dec 15, 2020 · 3 comments
Closed
Assignees
Labels
Component: update-bcd BCD updater script Priority 4 (Low) Lowest priority, not super important

Comments

@foolip
Copy link
Owner

foolip commented Dec 15, 2020

Splitting out #571 (comment) into a new issue after discussion with @vinyldarkscratch:

One feature of an improved design that I'd like to see is that we have an explicit test for contradictions between our results for a specific browser release and a BCD-style support statement. Given that, we could apply that test before and after we've updated, and much more easily see what contradictions weren't resolved by the updates. Maybe we'll see that there aren't many and we'd be better off resolving those contradictions manually.

@queengooborg queengooborg added Component: update-bcd BCD updater script Priority 2 (High) High priority, should be tackled soon labels Dec 15, 2020
@foolip foolip added Priority 4 (Low) Lowest priority, not super important and removed Priority 2 (High) High priority, should be tackled soon labels Mar 19, 2021
foolip added a commit that referenced this issue Mar 29, 2021
The previous logic for finding a `simpleStatement` considered anything
with a `version_removed` to be non-simple, going down the code path of
adding entries unless an identical one already existed.

Instead, limit this case to when there were no existing statements, in
which case there's no merging of entries to worry about.

Defer dealing with multiple existing statements or multiple inferred
statements beyond the simple case of no existing statements.

The resulting edits suggest differ by quite a lot from the previous
state, overall updating fewer files.

Fixes #1068.

#875 is the plan for
catching differences in the data which cannot yet be fixed
automatically.
foolip added a commit that referenced this issue Mar 29, 2021
The previous logic for finding a `simpleStatement` considered anything
with a `version_removed` to be non-simple, going down the code path of
adding entries unless an identical one already existed.

Instead, limit this case to when there were no existing statements, in
which case there's no merging of entries to worry about.

Defer dealing with multiple existing statements or multiple inferred
statements beyond the simple case of no existing statements.

The resulting edits suggest differ by quite a lot from the previous
state, overall updating fewer files.

Fixes #1068.

#875 is the plan for
catching differences in the data which cannot yet be fixed
automatically.
foolip added a commit that referenced this issue Mar 29, 2021
The previous logic for finding a `simpleStatement` considered anything
with a `version_removed` to be non-simple, going down the code path of
adding entries unless an identical one already existed.

Instead, limit this case to when there were no existing statements, in
which case there's no merging of entries to worry about.

Defer dealing with multiple existing statements or multiple inferred
statements beyond the simple case of no existing statements.

The resulting edits suggest differ by quite a lot from the previous
state, overall updating fewer files.

Fixes #1068.

#875 is the plan for
catching differences in the data which cannot yet be fixed
automatically.
@jugglinmike
Copy link
Contributor

Hi @foolip. @mzgoddard and I have a few questions about this:

  • Do you want this consistency check to be built into the “update-bcd” script? If not, do you want it to invoke “update-bcd”, or would you prefer the operator to invoke it themself?
  • Should “validate-consistency” use a single results file from the collector? Or should it operate on all the results files in the “mdn-bcd-results” directory (just like “update-bcd” operates today)?
  • We believe the intent of the “validate-consistency” script is to allow maintainers to understand how changes to “update-bcd” impact its effect on BCD when given the same data. Is that right?

@foolip
Copy link
Owner Author

foolip commented Jun 15, 2023

Do you want this consistency check to be built into the “update-bcd” script? If not, do you want it to invoke “update-bcd”, or would you prefer the operator to invoke it themself?

I suspect things will turn out best if it's a mode of the update script, or at least if the two scripts use the same shared utility code. Otherwise, if they ever disagree we'll have one script suggesting changes that the other then complains about.

Should “validate-consistency” use a single results file from the collector? Or should it operate on all the results files in the “mdn-bcd-results” directory (just like “update-bcd” operates today)?

A single file is insufficient when there are OS differences, where using the macOS results would undo changes based on Windows results and vice versa. I think a BCD release is the right granularity, and batching beyond that is for ergonomics and convenience.

An important case of convenience is when an issue spans releases. A warning like "BCD says this thing is supported since Chrome 45, but collector results indicate support since Chrome 30" is better than 15 individual warnings from 30-44.

For an integrated approach with a single script, I'm imagining something like this:

  • Combine results for each browser release, resolving things like different results on macOS and Windows. The combined result could be null ("don't know") but it's one result. Maybe the strategy should be configurable.
  • For each per-browser-release result, look it up in BCD and see if there's a contradiction. Collect all confirmed and contradicted data points.
  • For each BCD entry with contradictions, determine the simplest possible edit that would resolve the contradiction without introducing new ones. I'm not 100% sure what the best strategy is, but it comes down to adding/merging/splitting ranges.
  • For any contradiction we don't know how to resolve, log a warning. Depending on the above, perhaps we can actually make all edits, but if not we should tell a human to intervene.

We believe the intent of the “validate-consistency” script is to allow maintainers to understand how changes to “update-bcd” impact its effect on BCD when given the same data. Is that right?

Not quite actually. I view it more as a way to form a burndown list of BCD errors, not limited to ones that we know how to resolve automatically. I hope my integrated approach above makes the intent more clear.

@foolip
Copy link
Owner Author

foolip commented Jun 15, 2023

In the approach I've sketched, I was particular about when to transform a set of results for individual releases into a support statement for ranges of releases.

When to make this transformation will affect the structure of the code, but the most important point is actually not when but whether it's a lossless transformation.

As long as it represents exactly the same thing, working with ranges would be fine.

What's important to avoid is filling in gaps (missing or null results) in the results when making the transformation and then warning about contractions in that gap, where we actually don't have any data.

I hope this might clarify the reason for suggesting per-release validation, but anything that is equivalent would be fine too.

@foolip foolip closed this as completed Sep 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Component: update-bcd BCD updater script Priority 4 (Low) Lowest priority, not super important
Projects
None yet
Development

No branches or pull requests

3 participants