Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd mirror features prevents pod to mount the image #1912

Closed
reza-rahim opened this issue Mar 10, 2021 · 8 comments
Closed

rbd mirror features prevents pod to mount the image #1912

reza-rahim opened this issue Mar 10, 2021 · 8 comments
Assignees
Labels
component/rbd Issues related to RBD

Comments

@reza-rahim
Copy link

Describe the bug

rbd mirror features prevents pod to mount the rbd image. Pod got stuck in creation.

Environment details

  • Image/version of Ceph CSI driver : 3.2
  • Helm chart version :
  • Kernel version : 5.11.3-1.el7.elrepo.x86_64
  • Mounter used for mounting PVC (for cephfs its fuse or kernel. for rbd its
    krbd or rbd-nbd) :
  • Kubernetes cluster version : 1.18
  • Ceph cluster version : Rook-ceph 1.5 - Octopus

Steps to reproduce

Steps to reproduce the behavior:

Steps:

  1. deployed a mirror enabled pool.
  2. Created dynamic PVC
  3. run the following command
    rbd mirror image enable csi-vol-47ca0472-7ed5-11eb-b969-e28228ec8762 snapshot --pool=replicapool
  4. deployed a pod

logs:
image replicapool/csi-vol-47ca0472-7ed5-11eb-b969-e28228ec8762 is still being used

Actual results

pod got stuck in container create state

Expected behavior

pod should be running

A clear and concise description of what you expected to happen.

Logs

If the issue is in PVC creation, deletion, cloning please attach complete logs
of below containers.

rbd status replicapool/csi-vol-47ca0472-7ed5-11eb-b969-e28228ec8762
Watchers: none

kubectl logs -n rook-ceph csi-rbdplugin-v777z -c csi-rbdplugin -f
E0306 23:42:55.299501 22053 utils.go:136] ID: 230 Req-ID: 0001-0009-rook-ceph-0000000000000002-47ca0472-7ed5-11eb-b969-e28228ec8762 GRPC error: rpc error: code = Internal desc = rbd image replicapool/csi-vol-47ca0472-7ed5-11eb-b969-e28228ec8762 is still being used
E0306 23:43:46.287070 22053 utils.go:136] ID: 233 Req-ID: 0001-0009-rook-ceph-0000000000000002-47ca0472-7ed5-11eb-b969-e28228ec8762 GRPC error: rpc error: code = Internal desc = rbd image replicapool/csi-vol-47ca0472-7ed5-11eb-b969-e28228ec8762 is still being used
E0306 23:44:38.236244 22053 utils.go:136] ID: 236 Req-ID: 0001-0009-rook-ceph-0000000000000002-47ca0472-7ed5-11eb-b969-e28228ec8762 GRPC error: rpc error: code = Internal desc = rbd image replicapool/csi-vol-47ca0472-7ed5-11eb-b969-e28228ec8762 is still being used

#############--------------##############

tail -f /var/log/messages
Mar 6 23:46:19 skube-us-storage-2 kubelet: E0306 23:46:19.371469 7817 kubelet_volumes.go:154] orphaned pod "7537c8fa-09fb-444a-99d1-ef3356eb9652" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Mar 6 23:46:21 skube-us-storage-2 kubelet: E0306 23:46:21.354999 7817 kubelet_volumes.go:154] orphaned pod "7537c8fa-09fb-444a-99d1-ef3356eb9652" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Mar 6 23:46:23 skube-us-storage-2 kubelet: E0306 23:46:23.370983 7817 kubelet_volumes.go:154] orphaned pod "7537c8fa-09fb-444a-99d1-ef3356eb9652" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Mar 6 23:46:25 skube-us-storage-2 kubelet: E0306 23:46:25.356687 7817 kubelet_volumes.go:154] orphaned pod "7537c8fa-09fb-444a-99d1-ef3356eb9652" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Mar 6 23:46:27 skube-us-storage-2 kubelet: E0306 23:46:27.366152 7817 kubelet_volumes.go:154] orphaned pod "7537c8fa-09fb-444a-99d1-ef3356eb9652" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Mar 6 23:46:28 skube-us-storage-2 kubelet: W0306 23:46:28.622129 7817 cpu_manager.go:367] [cpumanager] reconcileState: skipping container; ID not found in pod status (pod: csirbd-demo-pod, container: web-server, error: unable to find ID for container with name web-server in pod status (it may not be running))
Mar 6 23:46:29 skube-us-storage-2 kubelet: E0306 23:46:29.375706 7817 kubelet_volumes.go:154] orphaned pod "7537c8fa-09fb-444a-99d1-ef3356eb9652" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Mar 6 23:46:30 skube-us-storage-2 kubelet: E0306 23:46:30.051885 7817 csi_attacher.go:317] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = Internal desc = rbd image replicapool/csi-vol-47ca0472-7ed5-11eb-b969-e28228ec8762 is still being used
Mar 6 23:46:30 skube-us-storage-2 kubelet: E0306 23:46:30.052174 7817 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000002-47ca0472-7ed5-11eb-b969-e28228ec8762 podName: nodeName:}" failed. No retries permitted until 2021-03-06 23:46:46.052138979 +0000 UTC m=+10143.830126803 (durationBeforeRetry 16s). Error: "MountVolume.MountDevice failed for volume "pvc-d6cc04f9-d036-4d23-9278-84cd3cafa2ee" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000002-47ca0472-7ed5-11eb-b969-e28228ec8762") pod "csirbd-demo-pod" (UID: "6d0bd612-dee5-40a7-adf7-0535c885fa13") : rpc error: code = Internal desc = rbd image replicapool/csi-vol-47ca0472-7ed5-11eb-b969-e28228ec8762 is still being used"
Mar 6 23:46:31 skube-us-storage-2 kubelet: E0306 23:46:31.358787 7817 kubelet_volumes.go:154] orphaned pod "7537c8fa-09fb-444a-99d1-ef3356eb9652" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Mar 6 23:46:33 skube-us-storage-2 kubelet: E0306 23:46:33.370097 7817 kubelet_volumes.go:154] orphaned pod "7537c8fa-09fb-444a-99d1-ef3356eb9652" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.

If the issue is in PVC resize please attach complete logs of below containers.

  • csi-resizer and csi-rbdplugin/csi-cephfsplugin container logs from the
    provisioner pod.

If the issue is in snapshot creation and deletion please attach complete logs
of below containers.

  • csi-snapshotter and csi-rbdplugin/csi-cephfsplugin container logs from the
    provisioner pod.

If the issue is in PVC mounting please attach complete logs of below containers.

  • csi-rbdplugin/csi-cephfsplugin and driver-registrar container logs from
    plugin pod from the node where the mount is failing.

  • if required attach dmesg logs.

Note:- If its a rbd issue please provide only rbd related logs, if its a
cephfs issue please provide cephfs logs.

Additional context

Add any other context about the problem here.

For example:

Any existing bug report which describe about the similar issue/behavior

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 10, 2021

@reza-rahim can you provide the exact CSI Driver version. we have multiple bug fixes of releases v3.2 can you please specify vx.y.z?

@reza-rahim
Copy link
Author

This is I see in the pod definition

image: quay.io/cephcsi/cephcsi:v3.2.0

Would you please let me know how to get for detail?

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 10, 2021

@reza-rahim I will take a look at it 👍

@reza-rahim
Copy link
Author

awesome, it would be nice the csi also works "--image-feature exclusive-lock,journaling ". We are planning to do a multi datacenter disaster recovery POC with k8/rook/ceph when we have this issue resolved.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 10, 2021

will be added in #1325. i just need to rebase the PR and resolve the merge conflicts

@nixpanic nixpanic added the component/rbd Issues related to RBD label Mar 12, 2021
@humblec
Copy link
Collaborator

humblec commented Apr 20, 2021

@reza-rahim considering #1325 is merged and already part of the release v3.3, can we close this issue?

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Apr 20, 2021

Fixed by #1991. Closing

@Madhu-1 Madhu-1 closed this as completed Apr 20, 2021
@reza-rahim
Copy link
Author

sure @humblec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD
Projects
None yet
Development

No branches or pull requests

5 participants