Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AppWrapper gets stuck when wrapped resource is a CRD that is not installed #276

Closed
dgrove-oss opened this issue Dec 12, 2024 · 0 comments · Fixed by #277
Closed

AppWrapper gets stuck when wrapped resource is a CRD that is not installed #276

dgrove-oss opened this issue Dec 12, 2024 · 0 comments · Fixed by #277
Assignees
Labels
bug Something isn't working

Comments

@dgrove-oss
Copy link
Collaborator

Describe the Bug

Create an AppWrapper around a resource whose CRD is not installed.
As expected, creation fails and the AppWrapper enters a terminal failed state.

Unfortunately, deleting the appwrapper gets stuck with the appwrapper in a terminating state because the delete of the non-existing resource fails with an unexpected error.

yaml
                        cpu: 1
  status:
    componentStatus:
    - apiVersion: kubeflow.org/v1
      conditions:
      - lastTransitionTime: "2024-12-12T02:07:02Z"
        message: ""
        reason: ComponentCreationInitiated
        status: Unknown
        type: ResourcesDeployed
      kind: PyTorchJob
      name: pytorch-simple
      podSets:
      - path: template.spec.pytorchReplicaSpecs.Master.template
        replicas: 1
      - path: template.spec.pytorchReplicaSpecs.Worker.template
        replicas: 1
    conditions:
    - lastTransitionTime: "2024-12-12T02:07:02Z"
      message: Suspend is false
      reason: Resuming
      status: "True"
      type: QuotaReserved
    - lastTransitionTime: "2024-12-12T02:07:02Z"
      message: Suspend is false
      reason: Resuming
      status: "True"
      type: ResourcesDeployed
    - lastTransitionTime: "2024-12-12T02:07:02Z"
      message: Suspend is false
      reason: Resuming
      status: "False"
      type: PodsReady
    - lastTransitionTime: "2024-12-12T02:07:02Z"
      message: 'error creating components: no matches for kind "PyTorchJob" in version
        "kubeflow.org/v1"'
      reason: CreateFailed
      status: "True"
      type: Unhealthy
    - lastTransitionTime: "2024-12-12T02:07:02Z"
      message: ""
      reason: DeletionInitiated
      status: "True"
      type: DeletingResources
    phase: Terminating
kind: List
metadata:
  resourceVersion: ""
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:227
2024-12-12T02:10:17.473541557Z	ERROR	logr@v1.4.2/logr.go:301	Deletion error	{"controller": "AppWrapper", "controllerGroup": "workload.codeflare.dev", "controllerKind": "AppWrapper", "AppWrapper": {"name":"sample-pytorch-job","namespace":"default"}, "namespace": "default", "name": "sample-pytorch-job", "reconcileID": "936970f7-f7db-4a0c-b561-bad27b1dd2fe", "error": "no matches for kind \"PyTorchJob\" in version \"kubeflow.org/v1\""}
github.com/go-logr/logr.Logger.Error
	/go/pkg/mod/github.com/go-logr/logr@v1.4.2/logr.go:301
github.com/project-codeflare/appwrapper/internal/controller/appwrapper.(*AppWrapperReconciler).deleteComponents.func1
	/workspace/internal/controller/appwrapper/resource_management.go:371
github.com/project-codeflare/appwrapper/internal/controller/appwrapper.(*AppWrapperReconciler).deleteComponents
	/workspace/internal/controller/appwrapper/resource_management.go:386
github.com/project-codeflare/appwrapper/internal/controller/appwrapper.(*AppWrapperReconciler).Reconcile
	/workspace/internal/controller/appwrapper/appwrapper_controller.go:120
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:227
(base) dgrove@Dave's IBM Mac kueue % kubectl get appwrapper -o yaml 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant