Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"translating proxies error" after upgrading from 1.16.9. to 1.17.1 when "gloo.gateway.persistProxySpec=true" #9968

Closed
htech7x opened this issue Aug 28, 2024 · 7 comments
Assignees
Labels
Prioritized Indicating issue prioritized to be worked on in RFE stream release/1.17 release/1.18 release-blocker An issue that prevents a release from occurring. Used with `release/N` label. Type: Bug Something isn't working zendesk

Comments

@htech7x
Copy link

htech7x commented Aug 28, 2024

Gloo Edge Product

Enterprise

Gloo Edge Version

1.17.1

Kubernetes Version

1.28.5

Describe the bug

After upgrading Gloo EE from 1.16.9 to 1.17.1 when "gloo.gateway.persistProxySpec=true":
Newly created "virtual services" no longer have a "status".
"Proxy" object stops updating its configuration.

Expected Behavior

"Proxy" updates its config when a new VS is created

Steps to reproduce the bug

  1. Install Gloo EE 1.16.9 with the option "gloo.gateway.persistProxySpec=true"
helm install gloo glooe/gloo-ee --version $GLOO_EE_VERSION --namespace gloo-system --create-namespace --set-string license_key=$GLOO_LICENSE_KEY --set gloo.gateway.persistProxySpec=true --set gloo-fed.enabled=false
  1. Create VS, for example https://docs.solo.io/gloo-edge/latest/guides/traffic_management/hello_world/
  2. Upgrade Gloo EE to 1.17.1
export NEW_VERSION=1.17.1
helm pull glooe/gloo-ee --version $NEW_VERSION --untar
kubectl apply -f gloo-ee/charts/gloo/crds
helm get values gloo -n gloo-system > values.yaml
helm upgrade -n gloo-system gloo glooe/gloo-ee \
  -f values.yaml \
  --version=$NEW_VERSION \
  --set license_key=$LICENSE_KEY
  1. Check the "gloo" logs:
kubectl logs deploy/gloo -n gloo-system
...
{"level":"error","ts":"2024-08-28T18:41:09.630Z","logger":"gloo-ee.v1.event_loop.setup","caller":"setup/setup_syncer.go:1066","msg":"gloo main event loop","version":"1.17.1","error":"event_loop.gloo: 1 error occurred:\n\t* translating proxies: 1 error occurred:\n\t* reconciling resource gateway-proxy: updating kube resource gateway-proxy: (want 43162817): proxies.gloo.solo.io \"gateway-proxy\" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update\n\n\n\n","errorVerbose":"1 error occurred:\n\t* translating proxies: 1 error occurred:\n\t* reconciling resource gateway-proxy: updating kube resource gateway-proxy: (want 43162817): proxies.gloo.solo.io \"gateway-proxy\" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update\n\n\n\n\nevent_loop.gloo\ngithub.com/solo-io/go-utils/errutils.AggregateErrs\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.25.1/errutils/aggregate_errs.go:19\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695","stacktrace":"github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.RunGlooWithExtensions.func11\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.17.4/projects/gloo/pkg/syncer/setup/setup_syncer.go:1066"}
  1. Create new pod/svc and VS:
--- web.yaml
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: web
  namespace: gloo-system
spec:
  virtualHost:
    domains:
    - "web.com"
    routes:
    - matchers:
      - prefix: /
      routeAction:
        single:
          upstream:
            name: default-web-80
            namespace: gloo-system
kubectl run web --image nginx --expose --port 80
kubectl apply -f web.yaml
  1. Check status of VS(newly created VS has no status):
glooctl get vs
+-----------------+--------------+---------+------+----------+-----------------+-----------------------------------+
| VIRTUAL SERVICE | DISPLAY NAME | DOMAINS | SSL  |  STATUS  | LISTENERPLUGINS |              ROUTES               |
+-----------------+--------------+---------+------+----------+-----------------+-----------------------------------+
| pet             |              | pet.com | none | Accepted |                 | / ->                              |
|                 |              |         |      |          |                 | gloo-system.default-petstore-8080 |
|                 |              |         |      |          |                 | (upstream)                        |
| web             |              | web.com | none |          |                 | / ->                              |
|                 |              |         |      |          |                 | gloo-system.default-web-80        |
|                 |              |         |      |          |                 | (upstream)                        |
+-----------------+--------------+---------+------+----------+-----------------+-----------------------------------+
  1. Check "proxy" config(there is nothing about the recently created VS "web"):
kubectl get proxies gateway-proxy -n gloo-system -o yaml
apiVersion: gloo.solo.io/v1
kind: Proxy
metadata:
  creationTimestamp: "2024-08-28T18:38:17Z"
  generation: 4
  labels:
    created_by: gloo-gateway-translator
  name: gateway-proxy
  namespace: gloo-system
  resourceVersion: "43162817"
  uid: e98940cf-e833-4a48-a424-a1d476601fab
spec:
  listeners:
  - bindAddress: '::'
    bindPort: 8080
    httpListener:
      virtualHosts:
      - domains:
        - pet.com
        metadataStatic:
          sources:
          - observedGeneration: "3"
            resourceKind: '*v1.VirtualService'
            resourceRef:
              name: pet
              namespace: gloo-system
        name: gloo-system.pet
        routes:
        - matchers:
          - prefix: /
          metadataStatic:
            sources:
            - observedGeneration: "3"
              resourceKind: '*v1.VirtualService'
              resourceRef:
                name: pet
                namespace: gloo-system
          options:
            prefixRewrite: /api/pets
          routeAction:
            single:
              upstream:
                name: default-petstore-8080
                namespace: gloo-system
    metadataStatic:
      sources:
      - observedGeneration: "3"
        resourceKind: '*v1.Gateway'
        resourceRef:
          name: gateway-proxy
          namespace: gloo-system
    name: listener-::-8080
    useProxyProto: false
  - bindAddress: '::'
    bindPort: 8443
    httpListener: {}
    metadataStatic:
      sources:
      - observedGeneration: "3"
        resourceKind: '*v1.Gateway'
        resourceRef:
          name: gateway-proxy-ssl
          namespace: gloo-system
    name: listener-::-8443
    useProxyProto: false
status:
  statuses:
    gloo-system:
      reportedBy: gloo
      state: 1

Additional Environment Detail

No response

Additional Context

No response

┆Issue is synchronized with this Asana task by Unito

@htech7x htech7x added the Type: Bug Something isn't working label Aug 28, 2024
@sam-heilbron
Copy link
Contributor

Internal slack thread: https://solo-io-corp.slack.com/archives/CEDCS8TAP/p1724785852085279
Potentially relevant PR: #9310 (comment)

@soloio-bot
Copy link

Zendesk ticket #4392 has been linked to this issue.

@DuncanDoyle DuncanDoyle added Prioritized Indicating issue prioritized to be worked on in RFE stream release-blocker An issue that prevents a release from occurring. Used with `release/N` label. labels Sep 4, 2024
@sam-heilbron
Copy link
Contributor

This is another instance of #6406. I think in part we got bit by this because we do not recommend using persistProxySpec=true and migrated all of our tests to use the recommended setting. We need a single test that verifies that when proxies are persisted, you can upgrade without error (and Gloo continues to process resources)

@soloio-bot
Copy link

Zendesk ticket #4499 has been linked to this issue.

@sam-heilbron
Copy link
Contributor

I documented reproduction steps here as well: https://github.com/solo-io/gloo-gateway-shared-resources/tree/main/issues/gloo/9968

This was referenced Sep 12, 2024
@nfuden
Copy link
Contributor

nfuden commented Sep 12, 2024

Work around is to delete the proxy resource and then gloo will self heal

@nfuden
Copy link
Contributor

nfuden commented Sep 17, 2024

release in 1.17.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Prioritized Indicating issue prioritized to be worked on in RFE stream release/1.17 release/1.18 release-blocker An issue that prevents a release from occurring. Used with `release/N` label. Type: Bug Something isn't working zendesk
Projects
None yet
Development

No branches or pull requests

5 participants