Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s regression test (istio) flake #8554

Closed
sheidkamp opened this issue Aug 8, 2023 · 6 comments
Closed

k8s regression test (istio) flake #8554

sheidkamp opened this issue Aug 8, 2023 · 6 comments
Assignees
Labels
Area: Istio Issues related to Gloo Edge integration with Istio Type: Bug Something isn't working Type: CI Test Flake A non-deterministic test that slows down development

Comments

@sheidkamp
Copy link
Contributor

Which tests failed?

Failed test was [It] with matching port and target port in /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:113

Initial Investigation

State never moves from "Pending" to "Accepted":

  [FAILED] Timed out after 30.001s.
  Expected
      <string>: Status
  to match fields: {
  .State:
  	Expected
  	    <core.Status_State>: 0
  	to equal
  	    <core.Status_State>: 1
  }

Additional Information

failure https://github.com/solo-io/gloo/actions/runs/5798585599/attempts/1

@sheidkamp sheidkamp added Type: Bug Something isn't working Type: CI Test Flake A non-deterministic test that slows down development labels Aug 8, 2023
@inFocus7
Copy link
Contributor

Seen in this run

@sam-heilbron sam-heilbron added the Area: Istio Issues related to Gloo Edge integration with Istio label Nov 21, 2023
@sheidkamp
Copy link
Contributor Author

sheidkamp commented Jan 4, 2024

Hit this one again: https://github.com/solo-io/gloo/actions/runs/7410687254/job/20163550838?pr=9030

Passed on the 4th try. Seems to be failing on a different line though:

[FAILED] [30.019 seconds]
Gloo + Istio integration tests Istio mTLS [BeforeEach] strict peer auth when mtls is enabled for the upstream should make a request with the expected cert header
  [BeforeEach] /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:315
  [It] /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:479

  [FAILED] Timed out after 30.000s.
  Expected
      <string>: Status
  to match fields: {
  .State:
  	Expected
  	    <core.Status_State>: 0
  	to equal
  	    <core.Status_State>: 1
  }
  
  In [BeforeEach] at: /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:323 @ 01/04/24 14:29:52.806

  Full Stack Trace
    github.com/solo-io/gloo/test/kube2e/istio_test.glob..func1.3.1()
    	/home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:323 +0x245
------------------------------
SSS

Summarizing 1 Failure:
  [FAIL] Gloo + Istio integration tests Istio mTLS [BeforeEach] strict peer auth when mtls is enabled for the upstream should make a request with the expected cert header
  /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:323

Ran 7 of 10 Specs in 139.834 seconds
FAIL! -- 6 Passed | 1 Failed | 0 Pending | 3 Skipped
--- FAIL: TestIstio (139.85s)
FAIL

@npolshakova
Copy link
Contributor

npolshakova commented Mar 12, 2024

The only error in the logs is coming from the eds watcher. In the istio integration test, it fails the status check when this error happens on the control plane due to the upstream not being picked up when the eds plugin is run:

{"level":"error","ts":"2024-03-11T23:07:00.229Z","logger":"gloo.v1.event_loop.setup.v1.event_loop.syncer.kubernetes_eds","caller":"kubernetes/eds.go:215","msg":"upstream gloo-system.gloo-system-testserver-1234: port 1234 not found for service testserver","version":"1.0.0-ci","stacktrace":"github.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).List\n\t/Users/ninapolshakova/solo/gloo/projects/gloo/pkg/plugins/kubernetes/eds.go:215\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).watch.func1\n\t/Users/ninapolshakova/solo/gloo/projects/gloo/pkg/plugins/kubernetes/eds.go:236\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).watch.func2\n\t/Users/ninapolshakova/solo/gloo/projects/gloo/pkg/plugins/kubernetes/eds.go:263"}
{"level":"error","ts":"2024-03-11T23:07:00.229Z","logger":"gloo.v1.event_loop.setup.v1.event_loop.syncer.kubernetes_eds","caller":"kubernetes/eds.go:215","msg":"upstream gloo-system.kube-svc:gloo-system-testserver-1234: port 1234 not found for service testserver","version":"1.0.0-ci","stacktrace":"github.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).List\n\t/Users/ninapolshakova/solo/gloo/projects/gloo/pkg/plugins/kubernetes/eds.go:215\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).watch.func1\n\t/Users/ninapolshakova/solo/gloo/projects/gloo/pkg/plugins/kubernetes/eds.go:236\ngithub.com/solo-io/gloo/projects/gloo/pkg/plugins/kubernetes.(*edsWatcher).watch.func2\n\t/Users/ninapolshakova/solo/gloo/projects/gloo/pkg/plugins/kubernetes/eds.go:263"}

You can reproduce this by focusing the port settings istio regression tests, running with --unit-it-fails and preventing clean up by adding a check to CurrentSpecReport().Failed() to avoid clean up in the AfterEach().

This results in the VirtualService being stuck in an empty status state even though the other resources are valid:

apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  creationTimestamp: "2024-03-11T23:07:01Z"
  generation: 1
  name: testserver
  namespace: gloo-system
  resourceVersion: "20441"
  uid: 5627916c-2aea-4681-8f91-9dc372552de2
spec:
  virtualHost:
    domains:
    - testserver
    routes:
    - matchers:
      - prefix: /
      routeAction:
        single:
          upstream:
            name: gloo-system-testserver-1234
            namespace: gloo-system
status:
  statuses: {}
---
apiVersion: gloo.solo.io/v1
kind: Upstream
metadata:
  creationTimestamp: "2024-03-11T23:07:00Z"
  generation: 2
  labels:
    discovered_by: kubernetesplugin
  name: gloo-system-testserver-1234
  namespace: gloo-system
  resourceVersion: "20440"
  uid: a3df734a-1b6a-40c2-8530-40b22b612a0f
spec:
  discoveryMetadata:
    labels:
      gloo: testserver
  kube:
    selector:
      gloo: testserver
    serviceName: testserver
    serviceNamespace: gloo-system
    servicePort: 1234
status:
  statuses:
    gloo-system:
      reportedBy: gloo
      state: Accepted

---
apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    endpoints.kubernetes.io/last-change-trigger-time: "2024-03-11T23:07:00Z"
  creationTimestamp: "2024-03-11T23:07:00Z"
  labels:
    gloo: testserver
  name: testserver
  namespace: gloo-system
  resourceVersion: "20436"
  uid: 67c66447-a918-413b-987c-db9c2d5d27ed
subsets:
- addresses:
  - ip: 10.244.0.12
    nodeName: solo-test-cluster-control-plane
    targetRef:
      kind: Pod
      name: testserver
      namespace: gloo-system
      uid: 4fa124e7-50ed-42f9-b1c2-add228af06f5
  ports:
  - name: http
    port: 1234
    protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2024-03-11T23:07:00Z"
  labels:
    gloo: testserver
  name: testserver
  namespace: gloo-system
  resourceVersion: "20435"
  uid: 276417f0-3811-45c5-9882-caaef3389c18
spec:
  clusterIP: 10.96.244.47
  clusterIPs:
  - 10.96.244.47
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 1234
    protocol: TCP
    targetPort: 1234
  selector:
    gloo: testserver
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
---
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-03-11T20:41:05Z"
  labels:
    gloo: testserver
  name: testserver
  namespace: gloo-system
  resourceVersion: "1200"
  uid: 4fa124e7-50ed-42f9-b1c2-add228af06f5
spec:
  containers:
  - image: quay.io/solo-io/testrunner:v1.7.0-beta17
    imagePullPolicy: IfNotPresent
    name: testserver
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-9sktq
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: solo-test-cluster-control-plane
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 0
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-9sktq
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-03-11T20:41:05Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-03-11T20:41:09Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-03-11T20:41:09Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-03-11T20:41:05Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://ed8459f4f66adc416565c43073f0a3db795657fa83dc815a418c442c14545c59
    image: quay.io/solo-io/testrunner:v1.7.0-beta17
    imageID: quay.io/solo-io/testrunner@sha256:8dbf8d9a4c499d4f54cf009a0862d9f62eb40429b731958bd0f644f18fed1d4b
    lastState: {}
    name: testserver
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-03-11T20:41:09Z"
  hostIP: 172.18.0.2
  phase: Running
  podIP: 10.244.0.12
  podIPs:
  - ip: 10.244.0.12
  qosClass: BestEffort
  startTime: "2024-03-11T20:41:05Z"

@sam-heilbron
Copy link
Contributor

These tests suites were migrated to our new format, and the legacy tests have/will be removed by Nina. Marking as closed

@bewebi
Copy link
Contributor

bewebi commented Jul 23, 2024

Observed here on #9805 targeting 1.16

• [FAILED] [34.247 seconds]
Gloo + Istio integration tests port settings should act as expected with varied ports [It] without target port, and port matching pod's port
/home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:181

  [FAILED] Timed out after 30.001s.
  Expected
      <string>: Status
  to match fields: {
  .State:
  	Expected
  	    <core.Status_State>: 0
  	to equal
  	    <core.Status_State>: 1
  }
  
  In [It] at: /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:123 @ 07/23/24 15:01:56.029

  Full Stack Trace
    github.com/solo-io/gloo/test/kube2e/istio_test.glob..func1.1.3(0x4d2, 0xffffffffffffffff)
    	/home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:123 +0x886
    github.com/solo-io/gloo/test/kube2e/istio_test.glob..func1.1.4(0x1e9c6d9?, 0x1ea0f51?, 0x1ea0f89?)
    	/home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:164 +0x69
    reflect.Value.call({0x67c5860?, 0xc0004e53e0?, 0x13?}, {0x6f32655, 0x4}, {0xc000964b90, 0x3, 0x3?})
    	/opt/hostedtoolcache/go/1.21.11/x64/src/reflect/value.go:596 +0x14ce
    reflect.Value.Call({0x67c5860?, 0xc0004e53e0?, 0x7e5fcd0?}, {0xc000964b90, 0x3, 0x3})
    	/opt/hostedtoolcache/go/1.21.11/x64/src/reflect/value.go:380 +0xb6
------------------------------
SSSSSSSSS

Summarizing 1 Failure:
  [FAIL] Gloo + Istio integration tests port settings should act as expected with varied ports [It] without target port, and port matching pod's port
  /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:123

Ran 1 of 10 Specs in 112.084 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 9 Skipped

@bewebi
Copy link
Contributor

bewebi commented Oct 1, 2024

Observed here on #10138 targeting 1.14

  [FAILED] Timed out after 30.001s.
  Expected
      <string>: Status
  to match fields: {
  .State:
  	Expected
  	    <core.Status_State>: 0
  	to equal
  	    <core.Status_State>: 1
  }
  
  In [It] at: /home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:157 @ 10/01/24 16:11:24.888

  Full Stack Trace
    github.com/solo-io/gloo/test/kube2e/istio_test.glob..func1.1.3(0x4d3, 0xffffffffffffffff)
    	/home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:157 +0x13eb
    github.com/solo-io/gloo/test/kube2e/istio_test.glob..func1.1.4(0x1c9df79?, 0x1?, 0x0?)
    	/home/runner/work/gloo/gloo/test/kube2e/istio/istio_integration_test.go:163 +0x75
    reflect.Value.call({0x627dda0?, 0xc000976660?, 0x13?}, {0x6980049, 0x4}, {0xc001e38410, 0x3, 0x3?})
    	/opt/hostedtoolcache/go/1.20.14/x64/src/reflect/value.go:586 +0x13aa
    reflect.Value.Call({0x627dda0?, 0xc000976660?, 0x76ded80?}, {0xc001e38410, 0x3, 0x3})
    	/opt/hostedtoolcache/go/1.20.14/x64/src/reflect/value.go:370 +0xc8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Istio Issues related to Gloo Edge integration with Istio Type: Bug Something isn't working Type: CI Test Flake A non-deterministic test that slows down development
Projects
None yet
Development

No branches or pull requests

5 participants