Skip to content
This repository has been archived by the owner on May 6, 2022. It is now read-only.

Deleting ServiceInstance does not send a delete to the broker #1574

Closed
ash2k opened this issue Nov 16, 2017 · 12 comments
Closed

Deleting ServiceInstance does not send a delete to the broker #1574

ash2k opened this issue Nov 16, 2017 · 12 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@ash2k
Copy link
Contributor

ash2k commented Nov 16, 2017

How to reproduce:

  1. Create a namespace
  2. Create a ServiceInstance and wait until it is in Ready=true state
  3. Delete the namespace
    BUG 1: instance is not getting deleted after namespace is no longer there and the list operation shows that the instance is still there.
    Trying to delete it directly:
    # kubectl get serviceinstances alb1-instance -n app1
    NAME            AGE
    alb1-instance   11m
    # kubectl delete serviceinstances alb1-instance -n app1
    Error from server (NotFound): namespaces "app1" not found
  4. Re-create the namespace
  5. Delete the instance directly
    BUG 2: Instance is deleted instantaneously but no REST call is made to the broker
    # kubectl delete serviceinstances alb1-instance -n app1
    serviceinstance "alb1-instance" deleted
    
    # kubectl get serviceinstances -n app1
    No resources found.

Kubernetes and kubectl 1.8.3
Service Catalog v0.1.2

This is WITHOUT aggregation layer (kubectl talks to Service Catalog directly) but with proper RBAC setup. Does it matter?

/cc @staebler If I remember correctly you were asking about this issue on Slack.

@staebler
Copy link
Contributor

I get different results when I run this through the aggregator. In Step 3, the ServiceInstance is deleted, even though the response from the kubectl command is the same as you received. The Deprovision request is also sent to the broker in Step 3. I will re-test without using the aggregator to see if that makes a difference.

@staebler
Copy link
Contributor

staebler commented Nov 16, 2017

Should service catalog be adding a finalizer to namespaces in which its resources are created and watching for changes to those namespaces? Without aggregation, I don't think kube-apiserver can know that the service catalog resources need to be deleted when the namespace is deleted. That would explain Bug 1. Or maybe there is a race where the service-catalog resource may be deleted after the namespace, since service catalog is not adding a finalizer to the namespace. When the namespace does not exist, then the controller does not attempt to do any reconciliation for the resource.

I still don't understand how Bug 2 would happen, though.

@nilebox nilebox added the kind/bug Categorizes issue or PR as related to a bug. label Dec 13, 2017
@nilebox
Copy link
Contributor

nilebox commented Dec 21, 2017

Successfully reproduced the issue with minikube (k8s 1.8.0) and ups-broker as per the doc https://github.com/kubernetes-incubator/service-catalog/blob/master/docs/walkthrough.md

❯ kubectl create namespace test-ns
namespace "test-ns" created
❯ kubectl create -f contrib/examples/walkthrough/ups-instance.yaml
serviceinstance "ups-instance" created

then deleted a namespace:

❯ kubectl delete ns test-ns
namespace "test-ns" deleted

and checked the ups-broker logs (I have customized the ups-broker helm chart with extra -v argument for verbose logging):

❯ kubectl logs ups-broker-ups-broker-7c6794fb9b-lk5v5 -n ups-broker
I1221 06:18:23.588404       1 server.go:94] Starting server on %!d(string=:8080)
I1221 06:19:10.236621       1 server.go:111] Get Service Broker Catalog...
I1221 06:19:10.236635       1 controller.go:57] Catalog()
I1221 06:19:41.251468       1 server.go:137] CreateServiceInstance 4fbd32f8-4ef4-49b1-98cb-9bafe0bac487...
I1221 06:19:41.251690       1 controller.go:87] CreateServiceInstance()
I1221 06:19:41.251734       1 controller.go:118] Created User Provided Service Instance:
&{4fbd32f8-4ef4-49b1-98cb-9bafe0bac487 0xc4200fc048}

As you can see, despite the deletion of namespace, there was no OSB request for deprovisioning the instance.
Going to analyze Service Catalog logs to see if there is an obvious issue there.

@nilebox
Copy link
Contributor

nilebox commented Dec 21, 2017

Sorry, actually, there was a deprovisioning request, just 30 seconds later...

❯ kubectl logs ups-broker-ups-broker-7c6794fb9b-lk5v5 -n ups-broker
I1221 06:18:23.588404       1 server.go:94] Starting server on %!d(string=:8080)
I1221 06:19:10.236621       1 server.go:111] Get Service Broker Catalog...
I1221 06:19:10.236635       1 controller.go:57] Catalog()
I1221 06:19:41.251468       1 server.go:137] CreateServiceInstance 4fbd32f8-4ef4-49b1-98cb-9bafe0bac487...
I1221 06:19:41.251690       1 controller.go:87] CreateServiceInstance()
I1221 06:19:41.251734       1 controller.go:118] Created User Provided Service Instance:
&{4fbd32f8-4ef4-49b1-98cb-9bafe0bac487 0xc4200fc048}
I1221 06:20:09.129602       1 server.go:166] RemoveServiceInstance 4fbd32f8-4ef4-49b1-98cb-9bafe0bac487...
I1221 06:20:09.129653       1 controller.go:138] RemoveServiceInstance()
I1221 06:20:09.236762       1 server.go:166] RemoveServiceInstance 4fbd32f8-4ef4-49b1-98cb-9bafe0bac487...
I1221 06:20:09.236779       1 controller.go:138] RemoveServiceInstance()

Actually there were 2 requests (race condition?), and there is another issue with sending events after the namespace was deleted in Service Catalog log:

I1221 06:20:09.132646       1 request.go:836] Request Body: {"kind":"Event","apiVersion":"v1","metadata":{"name":"ups-instance.15023aebc9f60025","namespace":"test-ns","creationTimestamp":null},"involvedObject":{"kind":"ServiceInstance","namespace":"test-ns","name":"ups-instance","uid":"edbb99a1-e616-11e7-95a8-0242ac110009","apiVersion":"servicecatalog.k8s.io","resourceVersion":"12"},"reason":"DeprovisionedSuccessfully","message":"The instance was deprovisioned successfully","source":{"component":"service-catalog-controller-manager"},"firstTimestamp":"2017-12-21T06:20:09Z","lastTimestamp":"2017-12-21T06:20:09Z","count":1,"type":"Normal"}
I1221 06:20:09.134856       1 event.go:218] Event(v1.ObjectReference{Kind:"ServiceInstance", Namespace:"test-ns", Name:"ups-instance", UID:"edbb99a1-e616-11e7-95a8-0242ac110009", APIVersion:"servicecatalog.k8s.io", ResourceVersion:"12", FieldPath:""}): type: 'Normal' reason: 'DeprovisionedSuccessfully' The instance was deprovisioned successfully
I1221 06:20:09.137919       1 request.go:836] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"events \"ups-instance.15023aebc9f60025\" is forbidden: unable to create new content in namespace test-ns because it is being terminated.","reason":"Forbidden","details":{"name":"ups-instance.15023aebc9f60025","kind":"events"},"code":403}
E1221 06:20:09.138476       1 event.go:200] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"ups-instance.15023aebc9f60025", GenerateName:"", Namespace:"test-ns", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"ServiceInstance", Namespace:"test-ns", Name:"ups-instance", UID:"edbb99a1-e616-11e7-95a8-0242ac110009", APIVersion:"servicecatalog.k8s.io", ResourceVersion:"12", FieldPath:""}, Reason:"DeprovisionedSuccessfully", Message:"The instance was deprovisioned successfully", Source:v1.EventSource{Component:"service-catalog-controller-manager", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbe86f28647bf4625, ext:390403396045, loc:(*time.Location)(0x25ae700)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbe86f28647bf4625, ext:390403396045, loc:(*time.Location)(0x25ae700)}}, Count:1, Type:"Normal"}': 'events "ups-instance.15023aebc9f60025" is forbidden: unable to create new content in namespace test-ns because it is being terminated.' (will not retry!)

@nilebox
Copy link
Contributor

nilebox commented Dec 21, 2017

There is still definitely something wrong, as after another try I see 5 deprovisioning requests:

I1221 06:55:05.647383       1 controller.go:118] Created User Provided Service Instance:
&{41b81f12-1635-4e4b-82ad-fd2ed21366e4 0xc42000e9f0}
I1221 06:55:45.428902       1 server.go:166] RemoveServiceInstance 41b81f12-1635-4e4b-82ad-fd2ed21366e4...
I1221 06:55:45.428918       1 controller.go:138] RemoveServiceInstance()
I1221 06:55:45.438126       1 server.go:166] RemoveServiceInstance 41b81f12-1635-4e4b-82ad-fd2ed21366e4...
I1221 06:55:45.438146       1 controller.go:138] RemoveServiceInstance()
I1221 06:55:45.450907       1 server.go:166] RemoveServiceInstance 41b81f12-1635-4e4b-82ad-fd2ed21366e4...
I1221 06:55:45.450923       1 controller.go:138] RemoveServiceInstance()
I1221 06:55:45.543836       1 server.go:166] RemoveServiceInstance 41b81f12-1635-4e4b-82ad-fd2ed21366e4...
I1221 06:55:45.543888       1 controller.go:138] RemoveServiceInstance()
I1221 06:55:45.573153       1 server.go:166] RemoveServiceInstance 41b81f12-1635-4e4b-82ad-fd2ed21366e4...
I1221 06:55:45.573219       1 controller.go:138] RemoveServiceInstance()

Looks like Service Catalog also fails to update the status

I1221 06:55:45.541437       1 controller_instance.go:237] ServiceInstance "test-ns/ups-instance": Processing deleting event
I1221 06:55:45.541488       1 controller.go:331] ServiceInstance "test-ns/ups-instance": Creating client for ClusterServiceBroker ups-broker, URL: http://ups-broker-ups-broker.ups-broker.svc.cluster.local
I12
I1221 06:55:45.541518       1 controller_instance.go:423] ServiceInstance "test-ns/ups-instance": Deprovisioning
I1221 06:55:45.544207       1 controller_instance.go:517] ServiceInstance "test-ns/ups-instance": Deprovision call to broker succeeded, finalizing
I1221 06:55:45.544253       1 controller_instance.go:524] ServiceInstance "test-ns/ups-instance": Successfully deprovisioned, 4f6e6cf6-ffdd-425f-a2c7-3c9258ad2468 at ups-broker
I1221 06:55:45.544315       1 controller_instance.go:1778] ServiceInstance "test-ns/ups-instance": Setting condition "Ready" to False
I1221 06:55:45.544346       1 controller_instance.go:1841] ServiceInstance "test-ns/ups-instance": Updating status
...
I1221 06:55:45.548501       1 round_trippers.go:436] PUT https://10.96.0.1:443/apis/servicecatalog.k8s.io/v1beta1/namespaces/test-ns/serviceinstances/ups-instance/status 409 Conflict in 3 milliseconds
I1221 06:55:45.548567       1 request.go:836] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on serviceinstances.servicecatalog.k8s.io \"ups-instance\": StorageError: invalid object, Code: 4, Key: /registry/servicecatalog.k8s.io/serviceinstances/test-ns/ups-instance, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: dfdcbad4-e61b-11e7-95a8-0242ac110009, UID in object meta: ","reason":"Conflict","details":{"name":"ups-instance","group":"servicecatalog.k8s.io","kind":"serviceinstances"},"code":409}
E1221 06

And after several retries the last message is

I1221 06:55:45.667836       1 controller_instance.go:204] ServiceInstance "test-ns/ups-instance": Not doing work for test-ns/ups-instance because it has been deleted

@zhongyi-zhang
Copy link

Would someone follow up with this issue?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 29, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 28, 2019
@jberkhahn jberkhahn removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 1, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 29, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 29, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants