SIG Network test group for blocking jobs #19160

cmluciano · 2020-09-09T17:45:48Z

What should be cleaned up or changed:
A colleague of mine created a public collection of sig-node test jobs that are critical to determining sig-node test condition. SIG-Net would benefit from a similar job collection for network related jobs that may be blocking a release or consistently failing. We currently have the sig-network-test-failures mailing list but a testgrid collection should also be made for looking at overall jobs.

I think we want a similar setup as SIG Node but tuned for looking at only sig-network related tests:

blocking presubmits: https://testgrid.k8s.io/presubmits-kubernetes-blocking
pr presubmits: https://testgrid.k8s.io/presubmits-kubernetes-blocking
new job that includes all conformance tests related to sig-net

Provide any links for context:
Mailing list
sig-node informing/blocking jobs

cmluciano · 2020-09-09T17:46:45Z

cc @aojea @danwinship for any further ideas on what we should add or remove to signal overall sig-net test health

cmluciano · 2020-09-09T17:46:51Z

/assign

aojea · 2020-09-10T09:12:37Z

I think we should start listing the "apis" we own and the components, ie..

apis:

endpoints/endpoints slices
ingress/service API
services
network policy
dns?
...

components:

kube-proxy (iptables, ipvs, ...)
some parts of the kubelet?? (node addresses/kubelet network)
...

and then investigate how well we are covering those and report based on area, ideally we should be able to report on

sig-network apis are (3 level status, OK, need attention, FAIL) , and drill down to say services API has a X% coverage (there is an apisnoop tool that reports that, don't know if we can leverage that)
sig-network components are OK (same 3 level report) , I don't think we need coverage for components, just an aggregation of success rate of the jobs as testgrid is doing today

and block on those reports.

I've added kind jobs based on the regex [sig-network] https://testgrid.k8s.io/sig-network-kind ,
but honestly I think we are not covering well enough the areas/components I listed above.
And the most important tests are already covered in

blocking presubmits: https://testgrid.k8s.io/presubmits-kubernetes-blocking
pr presubmits: https://testgrid.k8s.io/presubmits-kubernetes-blocking

my 2 cents

/cc @thockin

spiffxp · 2020-09-11T02:27:28Z

/sig network

aojea · 2020-09-11T11:17:26Z

i.e. for api coverage

neworking api coverage https://apisnoop.cncf.io/1.20.0/stable/networking
I don't know if it is possible to get the endpoints and services from here
https://apisnoop.cncf.io/1.20.0/stable/core

for the components ones, it's a bit tricky, per example, for ingress and network policy only the API is defined and left to 3rd party component the implementation.
The only option I see is to add a job to testgrid using a particular implementation, but this is a big can of worms and can bias the community to believe it is the standard 🤷

cmluciano · 2020-09-11T18:41:28Z

Yes looks like we can get the endpoint tested state from core

Is it necessary to have a dashboard for the coverage if it is already presented and tracked by CNCF through apisnoop?

cmluciano · 2020-09-11T18:47:26Z

for ingress and network policy only the API is defined and left to 3rd party component the implementation.
The only option I see is to add a job to testgrid using a particular implementation, but this is a big can of worms and can bias the community to believe it is the standard shrug

I tried something like the implementation-specific example when we were still working on kubernetes-anywhere but it didn't go very far because we did not want to choose one implementation over another.

Testing the APIs themselves with conformance type tests are probably good enough IMO.

I've added kind jobs based on the regex [sig-network] https://testgrid.k8s.io/sig-network-kind ,
but honestly I think we are not covering well enough the areas/components I listed above.
And the most important tests are already covered in

blocking presubmits: https://testgrid.k8s.io/presubmits-kubernetes-blocking
pr presubmits: https://testgrid.k8s.io/presubmits-kubernetes-blocking

I agree that filtering on the SIG-Net labels in the presubmit dashboards and putting them under our grouping would be very useful. I will probably start with that.

aojea · 2020-09-11T22:11:26Z

Is it necessary to have a dashboard for the coverage if it is already presented and tracked by CNCF through apisnoop?

nah, just brainstorming, we are currently blocking on testgrid jobs but, it may be interesting to block on coverage? at least per release? i.e release x+1 can' t have less coverage than release x, cc: @spiffxp

aojea · 2020-09-12T10:11:32Z

Related job to gate on coverage #19173

cmluciano · 2020-09-14T16:57:50Z

Is it necessary to have a dashboard for the coverage if it is already presented and tracked by CNCF through apisnoop?

nah, just brainstorming, we are currently blocking on testgrid jobs but, it may be interesting to block on coverage? at least per release? i.e release x+1 can' t have less coverage than release x, cc: @spiffxp

Cool, for what it's worth, I do agree that we should have conformance coverage tests for the NetworkPolicy API in stable. I think NP went stable before the idea of API conformance testing and I will open an issue/add those tests.

fejta-bot · 2020-12-13T17:55:47Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2021-01-12T18:40:50Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

cmluciano · 2021-01-13T18:58:05Z

/remove-lifecycle rotten

fejta-bot · 2021-04-13T19:56:49Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-05-13T20:31:28Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

fejta-bot · 2021-06-12T21:02:58Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-06-12T21:03:04Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cmluciano added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Sep 9, 2020

k8s-ci-robot assigned cmluciano Sep 9, 2020

k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Sep 11, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 12, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 13, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 13, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 13, 2021

k8s-ci-robot closed this as completed Jun 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIG Network test group for blocking jobs #19160

SIG Network test group for blocking jobs #19160

cmluciano commented Sep 9, 2020

cmluciano commented Sep 9, 2020

cmluciano commented Sep 9, 2020

aojea commented Sep 10, 2020 •

edited

Loading

spiffxp commented Sep 11, 2020

aojea commented Sep 11, 2020

cmluciano commented Sep 11, 2020

cmluciano commented Sep 11, 2020

aojea commented Sep 11, 2020

aojea commented Sep 12, 2020

cmluciano commented Sep 14, 2020

fejta-bot commented Dec 13, 2020

fejta-bot commented Jan 12, 2021

cmluciano commented Jan 13, 2021

fejta-bot commented Apr 13, 2021

fejta-bot commented May 13, 2021

fejta-bot commented Jun 12, 2021

k8s-ci-robot commented Jun 12, 2021

SIG Network test group for blocking jobs #19160

SIG Network test group for blocking jobs #19160

Comments

cmluciano commented Sep 9, 2020

cmluciano commented Sep 9, 2020

cmluciano commented Sep 9, 2020

aojea commented Sep 10, 2020 • edited Loading

spiffxp commented Sep 11, 2020

aojea commented Sep 11, 2020

cmluciano commented Sep 11, 2020

cmluciano commented Sep 11, 2020

aojea commented Sep 11, 2020

aojea commented Sep 12, 2020

cmluciano commented Sep 14, 2020

fejta-bot commented Dec 13, 2020

fejta-bot commented Jan 12, 2021

cmluciano commented Jan 13, 2021

fejta-bot commented Apr 13, 2021

fejta-bot commented May 13, 2021

fejta-bot commented Jun 12, 2021

k8s-ci-robot commented Jun 12, 2021

aojea commented Sep 10, 2020 •

edited

Loading