You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@platform-delivery-tooling-support folks, I will describe the steps to reproduce an outstanding issue with the operator in air-gapped environments.
In short, if the Controller Manager can’t reach the W&B endpoint, the entire stack will fail to be deployed.
In other words, this is a blocker.
cut the access of cluster nodes from the internet (this step was used only to repro)
apply the CRD
kubectl apply -f wandb.yaml
At this point, nothing will happen. Not a single container will be created.
check the Controller Manager logs
{"level":"dpanic","ts":"2024-08-07T13:33:40Z","msg":"non-string key argument passed to logging, ignoring all later arguments","controller":"weightsandbiases","controllerGroup":"apps.wandb.com","controllerKind":"WeightsAndBiases","WeightsAndBiases":{"name":"wandb","namespace":"default"},"namespace":"default","name":"wandb","reconcileID":"08ea8f72-2387-4b9b-b445-0afc2b2caa58","invalid key":"Secret \"wandb-latest-cached-release\" not found","stacktrace":"github.com/wandb/operator/controllers.(*WeightsAndBiasesReconciler).Reconcile\n\t/workspace/controllers/weightsandbiases_controller.go:135\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.1/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.1/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.1/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.1/pkg/internal/controller/controller.go:235"}
{"level":"error","ts":"2024-08-07T13:33:40Z","msg":"No cached release found for deployer spec","controller":"weightsandbiases","controllerGroup":"apps.wandb.com","controllerKind":"WeightsAndBiases","WeightsAndBiases":{"name":"wandb","namespace":"default"},"namespace":"default","name":"wandb","reconcileID":"08ea8f72-2387-4b9b-b445-0afc2b2caa58","error":"Secret \"wandb-latest-cached-release\" not...
Issue created in Slack from a [message](https://weightsandbiases.slack.com/archives/C06CKRTPKDF/p1723039059602709?thread_ts=1723039059.602709&cid=C06CKRTPKDF).
The text was updated successfully, but these errors were encountered:
Description
@platform-delivery-tooling-support folks, I will describe the steps to reproduce an outstanding issue with the operator in air-gapped environments.
In short, if the Controller Manager can’t reach the W&B endpoint, the entire stack will fail to be deployed.
In other words, this is a blocker.
helm upgrade --install -n wandb operator wandb/operator
cut the access of cluster nodes from the internet (this step was used only to repro)
apply the CRD
kubectl apply -f wandb.yaml
At this point, nothing will happen. Not a single container will be created.
The text was updated successfully, but these errors were encountered: