Skip to content
This repository has been archived by the owner on Nov 9, 2017. It is now read-only.

Enforce citgm for major changes #126

Closed
MylesBorins opened this issue May 11, 2017 · 37 comments
Closed

Enforce citgm for major changes #126

MylesBorins opened this issue May 11, 2017 · 37 comments

Comments

@MylesBorins
Copy link

We should do this.

At the very least it will document in the thread that we care about these breakages

@jasnell
Copy link
Member

jasnell commented May 11, 2017

I'm fine with this. Would need a PR adding the requirement to the collaborator docs. If it's not there already, we should make it a required step for all releases also... perhaps even listing the failures in the release notes as known issues.

@addaleax
Copy link
Member

Sounds good. Two thoughts:

  • Reading CITGM output is hard because there are so many known/expected failures, and it’s not obvious where those come from. Can we have a tracking issue at nodejs/node with all the failures?
  • While error message changes are still semver-major, can we exclude them from this rule?

@jasnell
Copy link
Member

jasnell commented May 11, 2017

It makes a ton of sense to exclude certain types of semver-major's from this rule:

  1. error message changes
  2. semver-major changes to Node.js' own tooling, build, test framework or benchmarks

Generally, only semver-major changes in src/* and lib/* would be subject to the rule.

@sam-github
Copy link

Don't we want to know when an error message change breaks a package in npmjs.org?

@jasnell
Copy link
Member

jasnell commented May 11, 2017

Sure, but those do not necessarily need to be identified before landing the PR... we're also not likely to use such a failure to determine whether or not to land such PRs. If we're changing the error message, we're likely doing so for a very good reason.

@cjihrig
Copy link

cjihrig commented May 11, 2017

SGTM. Like @addaleax, I'd like easier to understand CITGM output.

What happens when a breaking change actually breaks something in CITGM? Would we just never break anything ever again? Is that something we'd decide on a case by case basis?

@jasnell
Copy link
Member

jasnell commented May 11, 2017

We would need to decide on a case-by-case basis. CITGM is not a gate, it's an early warning system. It helps us to understand what the potential impact a change may have but it should never stop us from making any particular change.

@MylesBorins
Copy link
Author

MylesBorins commented May 12, 2017 via email

@jasnell
Copy link
Member

jasnell commented May 12, 2017

Oh, I don't doubt that at all @MylesBorins. I just don't believe that failures we happen to see in citgm should block changes that we feel should be made. For example, we will never get out of the cycle of treating error message changes as semver-major unless we continue working towards assigning error codes and making the error messages more consistent -- that work could very well break existing code but the benefits far outweigh the potential disruption to existing modules.

@mhdawson
Copy link
Member

I think we need to get CITGM to the point where it normally just runs/passes (at least one variant that we required for semver-majors). One approach may to be to fix the module versions.

Provided that the jobs run/pass reliably and takes less than 20 minutes I think requiring it for all semver majors should not be to much to ask. The issue of whether it blocks landing the PR or not can be handled on a case by case basis. Requiring the run would mean that we know where we stand earlier on as opposed to having to figure that out at release time.

@MylesBorins
Copy link
Author

MylesBorins commented May 12, 2017 via email

@mcollina
Copy link
Member

Currently there are a bunch of failures in citgm against master. I guess this is to be expected, as we do land breaking changes in master. How can we hold changes to have an all-green? How do I know my change did not break them more?

I have a very specific case for this, my stream.destroy() PR: nodejs/node#12925.

It would be good to have the "timeouts" reported differently than actual failures.

@MylesBorins
Copy link
Author

@mcollina unfortunately there is no way we can do introspection into the reasoning of a modules failure, so no way to know if failures are caused by timeouts. I would argue that master would not have a bunch of failures if we gated against citgm...

@mhdawson
Copy link
Member

Its a bit of the chicken and egg in terms of:

  • we need to gate against citgm to keep citgm green
  • we need citgm to be green to gate on citgm

I think we have to plot a path to getting citgm being green consistently, otherwise its too hard to get the maximum value from it. Once we managed to get the core regression builds passing most of the time, it became much easier to keep them green.

Changes in the modules themselves do add a complication and if they continuously break citgm independent from changes people are making in core we'll need a way to decouple from that. We may need both pinned modules for the regression runs for core, and the latest for nightly/releases. If we can keep citgm green on the latest all the better and we only need the one. I just think we have to get to the point were anybody can run it an it pretty much passes unless there is a problem with the change they made.

@refack
Copy link

refack commented May 16, 2017

my 4¢

  1. I actually think that CITGM should primarily be a gate for semver-minor, they should be breakage free (unless it's already so, and I just didn't RTFM).

Its a bit of the chicken and egg in terms of:

  • we need to gate against citgm to keep citgm green
  • we need citgm to be green to gate on citgm
  1. If it's possible for the CITGM team to estimate the amount of time to get the CI job green, we call a moratorium on landing breaking changes for this time window.

@addaleax
Copy link
Member

I actually think that CITGM should primarily be a gate for semver-minor, they should be breakage free (unless it's already so, and I just didn't RTFM).

Everything but semver-major changes should be breakage-free (by definition).

@refack
Copy link

refack commented May 17, 2017

Everything but semver-major changes should be breakage-free (by definition).

[asking] Do we test that it's actually breakage-free (i.e. CITGM), or do we deduce it from code reviews and our regression testing?

@MylesBorins
Copy link
Author

MylesBorins commented May 17, 2017 via email

@MylesBorins
Copy link
Author

The TLDR on making CITGM useful is that someone needs to audit the results on a weekly basis and keep the lookup up to date for every release line.

This could potentially be automated... but I'm not sure of a clean / intuitive way to do so

@addaleax
Copy link
Member

@nodejs/build We could have a citgm-daily (or citgm-weekly) like node-daily-master, right? That might help.

@mhdawson
Copy link
Member

@gibfahn or myself can easily setup the required CI jobs. Once we agree on what's needed one of us can volunteer to set them up.

@jasnell
Copy link
Member

jasnell commented May 17, 2017

Setting up the job is the easy part, yes? The real task will be ensuring that there is someone auditing the results. It would be interesting to see if we could have a page similar in nature to benchmarking.nodejs.org but that summarizes the results of the nightly citgm run.

@refack
Copy link

refack commented May 17, 2017

I'm volunteering to audit the results, but can't commit to every day. IMHO should be at least two people...
(Hopefully by doing it a couple of times I will figure who to automate it)

@Trott
Copy link
Member

Trott commented May 19, 2017

Setting up the job is the easy part, yes? The real task will be ensuring that there is someone auditing the results. It would be interesting to see if we could have a page similar in nature to benchmarking.nodejs.org but that summarizes the results of the nightly citgm run.

For node-daily-master, I just have it as part of my daily morning routine to check https://ci.nodejs.org/job/node-daily-master/. No special page needed. It might be perfectly usable to check the equivalent page for a daily citgm run.

If @refack is volunteering to stay sufficiently knowledgable about what a clean citgm run looks like and what a problematic one looks like, then that may be all we need for minimum requirements to be effective. (More people would be better of course, but one is all you need.)

@gibfahn
Copy link
Member

gibfahn commented May 19, 2017

(More people would be better of course, but one is all you need.)

I think one is most definitely not all you need. You need one person available at any time, which translates to at least five people (in my experience).

@refack
Copy link

refack commented May 19, 2017

You need one person available at any time, which translates to at least five people (in my experience).

I'm hoping that will be a temporary sitch, and that will figure out how to automated it.
FWIW we can start with a weekly (or semi-weekly) job.

@refack
Copy link

refack commented May 21, 2017

Until we find a better format: https://github.com/nodejs/node/wiki/CITGM-Status

@refack
Copy link

refack commented May 21, 2017

And bit more use off the WIKI
last night to today diff

@refack
Copy link

refack commented May 21, 2017

Question: should we keep a lookup.json just for the CI, so we could choose a baseline and turn it green.
It could be considered like a floating patch, since the CITGM's lookup.json takes into consideration a wider scope, and should be more parsimonious.

@gibfahn
Copy link
Member

gibfahn commented May 22, 2017

Question: should we keep a lookup.json just for the CI, so we could choose a baseline and turn it green.

Discussed in nodejs/citgm#407

@refack
Copy link

refack commented May 22, 2017

Yay my insomnia pays off...
We have our first hit nodejs/node#13098 (comment)
(well obviously not yay for the breakage, but for this validation process)
/cc @aqrln

@refack
Copy link

refack commented May 22, 2017

Question: when encountering a regression, should I inform the module owners (i.e. open an issue)?
Follow up: what's the criteria for a verified regression? for example 2 platform in 1 run, or 1 platform on 2 runs?

@aqrln
Copy link

aqrln commented May 22, 2017

@refack

when encountering a regression, should I inform the module owners (i.e. open an issue)?

I think that makes sense and is generally nice.

@mcollina
Copy link
Member

@refack IMHO before pinging the author we should check if it was our fault or not. If the original PR was not semver-major and expected to break that, maybe no, we should try to fix things on our side.

@refack
Copy link

refack commented May 22, 2017

If the original PR was not semver-major and expected to break that, maybe no, we should try to fix things on our side.

It's definatly an open question, when there is a range of commits suspected probably the module author is the best resource for pinning down the "culprit"...
For example in websockets/ws#1118 there was a range nodejs/node@171a43a...ad4765a. Admittedly in this case it was fairly easy for me to pin it down (modulo insomnia induced mixup)

@MylesBorins
Copy link
Author

generally I will either ping people on our repo when we add something to flaky or alternatively send a fix. (the latter is preferable )

@Trott
Copy link
Member

Trott commented Sep 8, 2017

This issue has been inactive for a while and this repository is now obsolete. I'm going to close this, but feel free to open another issue in a relevant active repository (TSC perhaps?) and include a link back to this issue if this is a subject that should receive continued attention.

@Trott Trott closed this as completed Sep 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests