Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MON-3775: add tests for the MetricsCollectionProfiles feature-gate #28889

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rexagod
Copy link
Member

@rexagod rexagod commented Jun 18, 2024

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 18, 2024

@rexagod: This pull request references MON-3775 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.17.0" version, but no target version was set.

In response to this:

OEP: openshift/enhancements#1298

/cc @simonpasquier

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 18, 2024
@rexagod
Copy link
Member Author

rexagod commented Jun 18, 2024

/test e2e-gcp-ovn-techpreview

@openshift-ci openshift-ci bot added the vendor-update Touching vendor dir or related files label Jun 18, 2024
@rexagod
Copy link
Member Author

rexagod commented Jun 18, 2024

@deads2k Following up on #28768 (comment), I dropped g.Context but /verify still seems to fail. I believe the root cause is make update being non-idempotent for local and CI environments.

I'm able to include my tests in the generated manifests using update, but doing so makes verify fail. Similarly, if I leave them out, and don't push the generated manifests that include the tests, verify passes but the tests don't run (as there's nothing to run). For some reason, CI's verify run doesn't seem to "discover" the added tests (as you mentioned), and generates artifacts that leave those tests out, making the tree dirty, causing the job to fail.

test/extended/prometheus/collection_profiles.go Outdated Show resolved Hide resolved
test/extended/prometheus/collection_profiles.go Outdated Show resolved Hide resolved
test/extended/prometheus/collection_profiles.go Outdated Show resolved Hide resolved
test/extended/prometheus/collection_profiles.go Outdated Show resolved Hide resolved
test/extended/prometheus/collection_profiles.go Outdated Show resolved Hide resolved
}
})
g.It("should hide a default-only metric when minimal collection profile is enabled", func() {
defaultOnlyMetric := "kube_deployment_status_replicas"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pick a Prometheus metric which we know for sure will (likely) never be present in the minimal profile (for instance prometheus_engine_query_log_enabled).

test/extended/prometheus/collection_profiles.go Outdated Show resolved Hide resolved
@rexagod
Copy link
Member Author

rexagod commented Jul 9, 2024

/test e2e-gcp-ovn-techpreview

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 9c556df

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade High
[sig-arch] events should not repeat pathologically for ns/openshift-kube-apiserver-operator
This test has passed 99.92% of 5096 runs on release 4.17 [Overall] in the last week.
pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial High
operator conditions monitoring
This test has passed 100.00% of 34 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-serial'] in the last 14 days.

@rexagod
Copy link
Member Author

rexagod commented Jul 9, 2024

/test e2e-gcp-ovn-techpreview

Updated manifests.

@rexagod
Copy link
Member Author

rexagod commented Jul 9, 2024

/test e2e-gcp-ovn-techpreview

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 49dd188

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial IncompleteTests
Tests for this run (27) are below the historical average (423): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

})
})

var _ = g.Context("in an environment that has the minimal profile pre-set", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to again point out I'm not a ginko expert, but shouldn't this context have a BeforeAll node to apply the minimal profile? Iiuc the intention is that ginko can reorder context nodes as it sees fit, so I'm not sure this will succeed in all cases. This currently relies on the previous context to apply the minimal profile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions that Context (alias for Describe) supports decorators (i.e., the Ordered parameter being passed here) while nesting more containers internally. The containers seem to execute as they appear (among other "weights" that work towards the same affect, and IDs, in case of Its).

I believe that as long as the structure is maintained, things should happen as expected. But LMK if this seems prone to manual oversight and I'll add those in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I missed that. Should we maybe have a comment preceding the context so future readers don't miss this like I did?

@rexagod
Copy link
Member Author

rexagod commented Jul 30, 2024

/test e2e-gcp-ovn-techpreview

@rexagod
Copy link
Member Author

rexagod commented Jul 30, 2024

/retest-required

pkg/clioptions/suiteselection/feature_filter.go Outdated Show resolved Hide resolved
test/extended/prometheus/collection_profiles.go Outdated Show resolved Hide resolved
})
})

var _ = g.Context("in an environment that has the minimal profile pre-set", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I missed that. Should we maybe have a comment preceding the context so future readers don't miss this like I did?

@rexagod
Copy link
Member Author

rexagod commented Jul 30, 2024

Testcases breaking after rebase, in addition to the Ginkgo's earlier Interrupted by user for two of them. I'll take a look.

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: a86de08

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial IncompleteTests
Tests for this run (27) are below the historical average (511): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@rexagod
Copy link
Member Author

rexagod commented Jul 30, 2024

/test e2e-gcp-ovn-techpreview

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 889ca0f

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial IncompleteTests
Tests for this run (27) are below the historical average (523): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@rexagod
Copy link
Member Author

rexagod commented Jul 30, 2024

/test e2e-gcp-ovn-techpreview

@rexagod
Copy link
Member Author

rexagod commented Sep 18, 2024

/test e2e-aws-ovn-ipsec-serial

None of the tests affect e2e-aws-ovn-ipsec-serial, which is also a non-required job. I believe the job is flaky and may be skipped.

cc @jan--f for lgtm, will post this in the API review forum once we have that on the PR.

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: da1d01a

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial High
[sig-architecture] platform pods should not exit more than once with a non-zero exit code
This test has passed 100.00% of 30 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days.

Open Bugs
Excessive Restarts on ingress operator
pull-ci-openshift-origin-master-e2e-gcp-ovn-techpreview-serial IncompleteTests
Tests for this run (106) are below the historical average (743): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@jan--f
Copy link
Contributor

jan--f commented Sep 23, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 23, 2024
@neisw
Copy link
Contributor

neisw commented Sep 25, 2024

Anyone reviewing the failures in e2e-gcp-ovn-techpreview-serial?

  • [FAILED] [902.571 seconds]
  [sig-instrumentation][OCPFeatureGate:MetricsCollectionProfiles][Serial][Conformance] The collection profiles feature-set in an environment that has the minimal profile pre-set [It] should hide all default-only metrics when minimal collection profile is enabled [Suite:openshift/conformance/serial/minimal]
  github.com/openshift/origin/test/extended/prometheus/collection_profiles.go:148

    [FAILED] Timed out after 900.001s.
    Expected
        <*errors.errorString | 0xc001fcfc90>: 
        no monitors found with collection profile: "minimal" and "app.kubernetes.io/name"="kube-state-metrics-minimal"
        {
            s: "no monitors found with collection profile: \"minimal\" and \"app.kubernetes.io/name\"=\"kube-state-metrics-minimal\"",
        }
    to be nil
    In [It] at: github.com/openshift/origin/test/extended/prometheus/collection_profiles.go:172 @ 08/01/24 15:27:20.654

@neisw
Copy link
Contributor

neisw commented Sep 25, 2024

/test e2e-gcp-ovn-techpreview-serial
/test e2e-gcp-ovn-techpreview

@neisw
Copy link
Contributor

neisw commented Sep 25, 2024

/test e2e-aws-ovn-single-node-techpreview
/test e2e-aws-ovn-techpreview

@rexagod
Copy link
Member Author

rexagod commented Sep 25, 2024

@neisw The test quoted here ran on Thu Aug 1 13:52:31 UTC 2024. The patch has seen multiple changes since then, in addition to the test not being serialized anymore, so any -serial jobs should exclude them.

AFAICT looking at the JUnit output, none of the logs in the currently failing jobs seem to be related to any of the tests introduced here. IIRC e2e-aws-ovn-ipsec-serial was the only failing job here most recently, but I'll trigger them all again and check for flakes stemming from here, just to be sure.

@rexagod
Copy link
Member Author

rexagod commented Sep 25, 2024

/retest

@neisw
Copy link
Contributor

neisw commented Sep 25, 2024

Thanks @rexagod - Just seemed worth double checking since this is targeting techpreview. Did note that run was from a few commits back. Likely not related but appreciate the recheck.

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: da1d01a

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial High
[sig-architecture] platform pods should not exit more than once with a non-zero exit code
This test has passed 100.00% of 3 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days.

Open Bugs
Excessive Restarts on ingress operator

@neisw
Copy link
Contributor

neisw commented Sep 25, 2024

/approve

Copy link
Contributor

openshift-ci bot commented Sep 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan--f, neisw, rexagod

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 25, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 26, 2024
@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: da1d01a

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial High
[sig-architecture] platform pods should not exit more than once with a non-zero exit code
This test has passed 100.00% of 6 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days.

Open Bugs
Excessive Restarts on ingress operator

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: da1d01a

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial High
[sig-architecture] platform pods should not exit more than once with a non-zero exit code
This test has passed 100.00% of 10 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days.

Open Bugs
Excessive Restarts on ingress operator
pull-ci-openshift-origin-master-e2e-gcp-ovn-builds IncompleteTests
Tests for this run (2) are below the historical average (829): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Refer: openshift/enhancements#1298
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 9, 2024
Copy link
Contributor

openshift-ci bot commented Oct 9, 2024

New changes are detected. LGTM label has been removed.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 9, 2024
r := &runner{}

g.BeforeAll(func() {
if !exutil.IsTechPreviewNoUpgrade(tctx, oc.AdminConfigClient()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signature changed during rebase, so modified this (from exutil.IsTechPreviewNoUpgrade(oc).

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 0bd6a90

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial High
pull-ci-openshift-origin-master-e2e-gcp-ovn-builds IncompleteTests
Tests for this run (2) are below the historical average (1219): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 0bd6a90

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-gcp-ovn-builds IncompleteTests
Tests for this run (2) are below the historical average (1219): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@rexagod
Copy link
Member Author

rexagod commented Oct 10, 2024

/retest-required

Copy link
Contributor

openshift-ci bot commented Oct 10, 2024

@rexagod: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn-techpreview-serial da1d01a link false /test e2e-gcp-ovn-techpreview-serial
ci/prow/e2e-aws-ovn-ipsec-serial 0bd6a90 link false /test e2e-aws-ovn-ipsec-serial
ci/prow/e2e-openstack-ovn 0bd6a90 link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-single-node-upgrade 0bd6a90 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-gcp-ovn-builds 0bd6a90 link true /test e2e-gcp-ovn-builds

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 0bd6a90

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-gcp-ovn-builds IncompleteTests
Tests for this run (15) are below the historical average (1242): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

1 similar comment
@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 0bd6a90

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-gcp-ovn-builds IncompleteTests
Tests for this run (15) are below the historical average (1242): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants