Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InitContainer Ordering Issue with CloudSQL Operator 1.6.0 in Istio-Managed Environments #641

Open
utmaks opened this issue Nov 26, 2024 · 4 comments
Assignees

Comments

@utmaks
Copy link

utmaks commented Nov 26, 2024

Expected Behavior

Right order for initContainers / integration with Istio.

Actual Behavior

Description:
During the upgrade of CloudSQL Operator to version 1.6.0, issues were encountered when deploying new instances of the
application. While the existing application instances continued functioning as expected, attempts to start new pods
failed due to critical container startup errors, which would be unacceptable in a production environment.

Details:
After upgrading to version 1.6.0 to utilize the minSigtermDelay parameter, the following problem was observed:

  • New instances of the application failed to initialize due to startup probe errors from the CloudSQL container:
    Startup probe failed: Get "http://10.210.0.89:15020/app-health/csql-apps-apps/startupz": dial tcp 10.210.0.89:15020: connect: connection refused.
  • Existing pods already in operation continued to function correctly, with no observed disruptions in their database
    interactions.

The issue seems tied to Istio sidecar injection. Specifically:

  • New pods attempted to start the istio-init container but failed before Istio was fully initialized.
  • The behavior differs from previous versions of the operator, which allowed greater compatibility in Istio-managed
    environments.

Resolution Attempt:
Disabling Istio temporarily resolved the problem, allowing the CloudSQL container to start correctly in the new pods.
However, this is not an acceptable solution for production workloads where Istio is required.

Proposed Fix for CloudSQL Operator team:
Reintroduce
user-configurable options for sidecar compatibility (e.g., sidecarType), as seen in earlier commits, to
prevent such failures in Istio environments and ensure robust behavior during deployment in production scenarios.

Steps to Reproduce the Problem

  1. Install Istio (1.19.3)
  2. Install Operator
  3. Create a Deployment with CloudSQL annotation in namespace managed by Istio

Specifications

  • Version: 1.6.0
  • Platform: v1.30.5-gke.1443001
  • Istio version: 1.19.3
@jackwotherspoon
Copy link
Collaborator

Thanks for this @utmaks 👏

@hessjcg is OOO this week but will take a look next week when he is back

@hessjcg
Copy link
Collaborator

hessjcg commented Dec 2, 2024

@utmaks,

This seems like a good reason to add a configuration parameter to the AuthProxyWorkload CRD to allow you and others to make the proxy run as a PodSpec.Container container instead of a PodSpec.InitContainer sidecar container.

I am curious to understand how Istio 1.19.3 and Operator 1.6.0 conspire to create pods that won't start. Would you mind posting the pod declaration for one of these failing pods (redacted, of course)?

I wonder if Istio adding its sidecar container to PodSpec.Container or PodSpec.InitContainer? What is the order of the containers, and is Istio starting first? Are there issues with the health checks?

@azunna1
Copy link

azunna1 commented Dec 23, 2024

Using these annotations on the pod template fixes the issue - sidecar.istio.io/rewriteAppHTTPProbers: 'false', traffic.sidecar.istio.io/excludeInboundPorts: '9801'. The problem is that the istio-validation init container always starts first.
If you have strict mTls mode enabled, healthchecks may fail due to the startup probe rewrite not being present.

@azunna1
Copy link

azunna1 commented Dec 23, 2024

For a more permanent fix, the csql sidecar containers should run as user 1337 as stated here: https://cloud.google.com/knowledge/kb/pod-are-fail-to-start-due-to-init-containers-not-starting-with-istio-cni-enabled-000007358
I tried using the AuthProxyWorkload's AuthProxyWorkloadSpec container field to set it but this overrides the default configuration. Would be nice if it did a merge instead. @hessjcg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants