Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different set of blocking jobs on stable1, 2 and 3 releases #9363

Closed
MaciekPytel opened this issue Sep 12, 2018 · 29 comments
Closed

Different set of blocking jobs on stable1, 2 and 3 releases #9363

MaciekPytel opened this issue Sep 12, 2018 · 29 comments
Assignees
Labels
area/jobs area/release-eng Issues or PRs related to the Release Engineering subproject kind/bug Categorizes issue or PR as related to a bug. sig/release Categorizes an issue or PR as relevant to SIG Release.
Milestone

Comments

@MaciekPytel
Copy link
Contributor

In particular soak-gci-gce-1.X job is blocking for stable2, but is not blocking on stable1 and stable3. As a result soak job will be blocking for just a part of each release lifecycle. We should have a consistent set of blocking jobs for all patch releases for a single branch. I think it would be more correct if we just have a specific set of blocking jobs defined for each release, rather than have rolling stable1-3.
If that's not possible we should at least make sure the set of jobs is the same for all three (the downside of that is that it will be hard to change the set of blocking jobs).

cc: @krzyzacy @BenTheElder @kubernetes/sig-release-bugs

@k8s-ci-robot k8s-ci-robot added sig/release Categorizes an issue or PR as relevant to SIG Release. kind/bug Categorizes issue or PR as related to a bug. labels Sep 12, 2018
@krzyzacy
Copy link
Member

cc @spiffxp

they are mostly same except for, say, 1.12 doesn't have kubeadm upgrade jobs set up

we can probably bring in a presubmit to enforce this

I'll remove the soak-stable2 job from release blocking dashboard.

@spiffxp
Copy link
Member

spiffxp commented Sep 17, 2018

/area jobs
We absolutely should have the same jobs. The only time I can see that differing is if we add new blocking jobs in the current release that couldn't work against older releases.

A presubmit sounds good.

I haven't done a thorough audit, but last I checked we were also missing a scalability-related stable3 job.

@krzyzacy
Copy link
Member

/assign

I'll create a doc under sig-release to define a list of release-blocking jobs, and we can use that as a source of truth in the future.

@spiffxp
Copy link
Member

spiffxp commented Oct 4, 2018

/milestone v1.13
FYI @cjwagner @jberkus

@spiffxp
Copy link
Member

spiffxp commented Oct 5, 2018

The plan is to accomplish this before 2018-10-23 aka Enhancements Freeze of v1.13 release cycle:

  • get release-1.y-blocking reconciled using release-1.12-blocking as the "authoritative" source
  • get sig-release-master-blocking reconciled agains release-1.12-blocking
  • reconcile means: same set of jobs, same naming scheme for testgrid tabs

@jberkus
Copy link
Contributor

jberkus commented Oct 6, 2018

I've done some comparison of master-blocking and 1.12-blocking. Here's where they don't match. Note that I'm using the actual job names below instead of the label you see in testgrid, because it's hard to figure out which job it is from the label:

Tests that are in 1.12-blocking with no equivalent in master-blocking:

  • ci-kubernetes-e2e-kubeadm-gce-1-11-on-1-12
  • periodic-kubernetes-bazel-build-1-12
  • periodic-kubernetes-bazel-test-1-12

Tests that are in master-blocking with no equivalent in 1.12-blocking

  • ci-kubernetes-e2e-gce-scale-correctness
  • ci-kubernetes-e2e-gce-scale-performance
  • ci-periodic-cloud-provider-openstack-acceptance-test-e2e-conformance
  • ci-periodic-cloud-provider-vsphere-test-e2e-conformance
  • ci-periodic-vsphere-test-e2e-conformance
  • periodic-kubernetes-e2e-packages-pushed

Also, note that several of the test jobs for 1.12-blocking are named "beta" instead of "1.12", which suggests that those may not be version-specific.

@krzyzacy
Copy link
Member

I unify the name of master-blocking dashboard with 1.12 a bit

for the no equivalent, if we take 1.12 as a source of truth, we have both postsubmit|periodic bazel job so we are probably fine with only keep one of them. @neolit123 might want to add a latest-release-on-master kubeadm job for consistency?

also do we really want all conformance tests from cloud providers to block release?

Also, note that several of the test jobs for 1.12-blocking are named "beta" instead of "1.12", which suggests that those may not be version-specific.

the release channels are defined at https://github.com/kubernetes/test-infra#release-branch-jobs--image-validation-jobs, so for each new release, we can rename the testgrid dashboard without remaking all the jobs

@neolit123
Copy link
Member

@neolit123 might want to add a latest-release-on-master kubeadm job for consistency?

i can add kubeadm-gce-stable-on-master in sig-release-master-blocking to make it consistent with -1.12-blocking.

also do we really want all conformance tests from cloud providers to block release?

that was something raised as a question last week with @spiffxp and @BenTheElder .

@spiffxp
Copy link
Member

spiffxp commented Jan 3, 2019

/milestone v1.14

The jobs aren't yet identical. I think we can close this out as we split up dashboards into -blocking/-informing etc. ref: kubernetes/sig-release#347

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 3, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 3, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Jul 9, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 19, 2019
@jberkus
Copy link
Contributor

jberkus commented Nov 20, 2019

AFAIK, this issue has not been resolved.

@Katharine @BenTheElder ?

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 20, 2019
@BenTheElder
Copy link
Member

I'm pretty over capacity and don't remember what we wanted here.

It's entirely reasonable imo to have different sets of jobs per release.
The config forker pretty much ensures that at each release we copy over jobs from master to that release.

@justaugustus
Copy link
Member

/assign
I was the last to touch config-forker, so I'll try and close the loop on this.

@justaugustus
Copy link
Member

/milestone v1.18
/area release-eng

@k8s-ci-robot k8s-ci-robot added the area/release-eng Issues or PRs related to the Release Engineering subproject label Nov 28, 2019
@k8s-ci-robot k8s-ci-robot modified the milestones: v1.16, v1.18 Nov 28, 2019
@jberkus
Copy link
Contributor

jberkus commented Dec 4, 2019

@BenTheElder @justaugustus

For the 1.16 release, the config-forker did not copy from master to the release; it copied a different, template set of tests based on the 1.13 test set. That may have been fixed since; if it has, that's one solution to this issue.

The core problem was that the config-forker was copying a set of jobs that was based on no SIG-release determined set, but was instead historical and impossible to update.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2020
@jberkus
Copy link
Contributor

jberkus commented Mar 3, 2020

/remove-lifecycle stale

Where are we on this? @Katharine ?

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2020
@Katharine
Copy link
Member

This should've been resolved long ago, and the last set of inconsistent jobs expired out.

As for this comment:

For the 1.16 release, the config-forker did not copy from master to the release; it copied a different, template set of tests based on the 1.13 test set. That may have been fixed since; if it has, that's one solution to this issue.

This is (or should be, to my knowledge) impossible, unless some awful misconfiguration happened. Why do you think that happened?

@jberkus
Copy link
Contributor

jberkus commented Mar 7, 2020

Because the set of jobs when 1.16 branch was created was different from the set of jobs in master, that's why. And when I asked about it, that's what test-infra folks said was why it happened.

Particularly, the slow performance tests had been moved from blocking to informing before the branch, but where back in blocking in the 1.16 set.

@jberkus
Copy link
Contributor

jberkus commented Mar 7, 2020

If it's copying from master now, then that's all good. I just wanted to check that it was.

@spiffxp
Copy link
Member

spiffxp commented Mar 11, 2020

I scanned through the release-blocking dashboards:

Once these three are closed, I'm going to call this closed unless there are any objections

@spiffxp
Copy link
Member

spiffxp commented Mar 12, 2020

/close
Please re-open if you think there's anything left to do here

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/close
Please re-open if you think there's anything left to do here

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/jobs area/release-eng Issues or PRs related to the Release Engineering subproject kind/bug Categorizes issue or PR as related to a bug. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests

10 participants