Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows CSI Node DaemonSet is not Running/ErrImagePull #1001

Closed
dschunack opened this issue Jul 28, 2021 · 36 comments
Closed

Windows CSI Node DaemonSet is not Running/ErrImagePull #1001

dschunack opened this issue Jul 28, 2021 · 36 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@dschunack
Copy link

dschunack commented Jul 28, 2021

/kind bug

What happened?
Windows CSI Node DaemonSet is not running due to an wrong Image without windows support for windows/amd64.

Looks for me that it was not tested on EKS Windows Nodes.

  Warning  Failed     15s                kubelet            Failed to pull image "k8s.gcr.io/provider-aws/aws-ebs-csi-driver:v1.1.3": rpc error: code = Unknown desc = no matching manifest for windows/amd64 10.0.17763 in the manifest list entries

What you expected to happen?
Running Windows CSI Node DaemonSet

How to reproduce it (as minimally and precisely as possible)?
Activate windows support in helm chart and spin up a Windows Node.

https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml#L133

Anything else we need to know?:

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  18s                default-scheduler  Successfully assigned kube-system/ebs-csi-node-windows-pt422 to ip-10-46-67-23.eu-central-1.compute.internal
  Normal   Pulling    16s                kubelet            Pulling image "k8s.gcr.io/provider-aws/aws-ebs-csi-driver:v1.1.3"
  Warning  Failed     15s                kubelet            Failed to pull image "k8s.gcr.io/provider-aws/aws-ebs-csi-driver:v1.1.3": rpc error: code = Unknown desc = no matching manifest for windows/amd64 10.0.17763 in the manifest list entries
  Normal   Pulled     15s                kubelet            Container image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.1.0" already present on machine
  Normal   Created    15s                kubelet            Created container node-driver-registrar
  Normal   Started    15s                kubelet            Started container node-driver-registrar
  Normal   Pulled     15s                kubelet            Container image "k8s.gcr.io/sig-storage/livenessprobe:v2.2.0" already present on machine
  Normal   Created    14s                kubelet            Created container liveness-probe
  Normal   Started    14s                kubelet            Started container liveness-probe
  Warning  Failed     11s (x2 over 12s)  kubelet            Error: ImagePullBackOff

Environment

  • Kubernetes version (use kubectl version): 1.20
  • Driver version: v1.1.3
  • Helm Chart v2.0.2
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 28, 2021
@wongma7
Copy link
Contributor

wongma7 commented Jul 28, 2021

Sorry, we checked in the node daemonset without actually releasing the image for it I added a disclaimer here that it's in pre-release state https://github.com/kubernetes-sigs/aws-ebs-csi-driver/tree/master/examples/kubernetes/windows#windows but ideally the chart shouldn't offer the option to install something that doesn't even exist yet.

#957 should fix it, after that I will make a release for 1.2.1 manifest list that contains windows image

/assign

@dschunack
Copy link
Author

Hi,

thanks for this info. We saw the option in the helm chart and we don't saw anything in the README.md that it is not finished yet. Now it's clear and we will wait on the new release.

@AndyXiangLi AndyXiangLi added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Aug 2, 2021
@dschunack
Copy link
Author

dschunack commented Aug 9, 2021

@wongma7 Hi, you released a new Container image that support Windows and Linux #957 , but make it really sense to put a Linux and Windows Layer in a single Container Image that supports multi OS?
We running currently more then 600 Linux EKS nodes(arm64/amd64) and 30 Windows Nodes. It makes absolute sense to support multi arch in the Container Image, but i think it's not an good idea to support multi OS in one Image, the overhead due to the different Container architecture is to large.
All Linux nodes need to load the whole Container Image with the Windows Layer inside. That is a Hugh overhead per Node. As short example, 600 Linux Nodes x 2.5G for the Windows Layer = 1.5 TB overhead for the Windows Container Layer on Linux Nodes. It need also more Bandwidth, Storage and Time to load/start a new Container Image through a Update.
I find it useful to split the images and to create a Linux and a Windows Container Image. It is also easier to support / integrate new features without affecting the other OS.
What do you think?

@wongma7
Copy link
Contributor

wongma7 commented Aug 9, 2021

@dschunack the container runtime (docker) should only pull the image that corresponds to the OS your container is running on. even though on ECR or Docker Hub it says the image is 2+GB, your Linux Nodes won't bother to pull the 2 GB windows images, so there's no need to worry.

Basically now the tag is referring to a manifest list or "fat manifest", which links to multiple images. Docker knows how to pull the image for your OS and ignore the other ones. https://docs.docker.com/registry/spec/manifest-v2-2/

For reference https://github.com/moby/moby/blob/7b9275c0da707b030e62c96b679a976f31f929d3/distribution/pull_v2_windows.go#L68

@iusergii
Copy link

iusergii commented Sep 9, 2021

@wongma7 It looks like 1.2.1 was released but still windows nodes can't pull the image k8s.gcr.io/provider-aws/aws-ebs-csi-driver:v1.2.1

@wongma7
Copy link
Contributor

wongma7 commented Sep 22, 2021

v1.3.0 in GCR (but not ECR) should have a windows image now.

@iusergii
Copy link

iusergii commented Sep 24, 2021

@wongma7 Updated my release with Helm Chart 2.3.0 and I can see Windows Image was successfully pulled on node:

REPOSITORY                                  TAG     IMAGE ID      CREATED     SIZE
k8s.gcr.io/provider-aws/aws-ebs-csi-driver  v1.3.0  93f2632e63a1  6 days ago  5.75GB

But Pod is in Error/CrashLoopBackOff state. ebs-plugin containers fails

I0923 17:21:23.071346   15268 metadata.go:101] retrieving instance data from ec2 metadata
I0923 17:21:23.317911   15268 metadata.go:108] ec2 metadata is available
panic: open \\.\\pipe\\csi-proxy-filesystem-v1: The system cannot find the file specified.

goroutine 1 [running]:
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newNodeService(0xc000076a00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go:93 +0x2b9
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver(0xc00059ff40, 0x7, 0x7, 0x188603b0108, 0x100000000000011, 0xc000078000)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:94 +0x445
main.main()
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:46 +0x257

@wongma7
Copy link
Contributor

wongma7 commented Sep 24, 2021

@iusergii
Copy link

@wongma7 I haven't. Sorry, my bad.

@vmasule
Copy link

vmasule commented Oct 6, 2021

@wongma7 I was facing same issue, updated driver to v1.3.0 from public docker repo and also installed CSI-Proxy based on instructions given in link, now I am facing following issue which bit difficult to understand. Am I missing something here, this happen when I try to install Example Windows given in above link.

Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Warning  FailedScheduling        12m (x2 over 12m)    default-scheduler        0/6 nodes are available: 6 persistentvolumeclaim "ebs-claim" not found.
  Normal   Scheduled               12m                  default-scheduler        Successfully assigned default/windows-server-iis-7c5fc8f6c5-h8k7p to ip-172-12-37-45.us-west-2.compute.internal
  Normal   SuccessfulAttachVolume  12m                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-462e8ac4-5ce3-4fde-9398-d0b799d1b128"
  Warning  FailedMount             6m34s (x2 over 10m)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[persistent-storage], unattached volumes=[kube-api-access-8z7hm persistent-storage]: timed out waiting for the condition
  Warning  FailedMount             112s (x13 over 12m)  kubelet                  MountVolume.SetUp failed for volume "pvc-462e8ac4-5ce3-4fde-9398-d0b799d1b128" : rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-462e8ac4-5ce3-4fde-9398-d0b799d1b128\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\04a8aea3-fa7a-4165-8260-374a0269e2d6\\volumes\\kubernetes.io~csi\\pvc-462e8ac4-5ce3-4fde-9398-d0b799d1b128\\mount": file does not exist
  Warning  FailedMount             24s (x4 over 8m37s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[persistent-storage], unattached volumes=[persistent-storage kube-api-access-8z7hm]: timed out waiting for the condition

@wongma7
Copy link
Contributor

wongma7 commented Oct 6, 2021

@vmasule can you share with me logs from the CSI driver on the same node as the windows-server-iis pod?

example:
kubectl get po windows-server-iis-asdf -o jsonpath='{.spec.nodeName}'
ip-192-168-47-235.us-west-2.compute.internal

kubectl get po -n kube-system --field-selector spec.nodeName=ip-192-168-47-235.us-west-2.compute.internal --selector app=ebs-csi-node
ebs-csi-node-windows-vdvrl 3/3 Running 0 79s

kubectl logs ebs-csi-node-windows-vdvrl -n kube-system ebs-plugin

@vmasule
Copy link

vmasule commented Oct 6, 2021

@wongma7 Thanks for speedy response, following is I see in the logs. I hope this helps.

E1006 15:34:34.419805    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 15:36:36.546716    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 15:38:38.626608    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 15:40:40.663125    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 15:42:42.768905    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 15:44:44.857508    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 15:46:46.939181    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 15:48:48.984433    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 15:50:51.071841    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 15:52:53.203763    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 15:54:55.283390    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 15:56:57.338877    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 15:58:59.459646    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 16:01:01.574850    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 16:03:03.663959    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 16:05:05.792897    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 16:07:07.822685    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 16:09:09.868176    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 16:11:11.914355    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 16:13:13.983101    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 16:15:16.002326    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist
E1006 16:17:18.062900    1360 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\d38a9777-330f-49ea-9316-a27cc295140a\\volumes\\kubernetes.io~csi\\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\\mount": file does not exist

@vmasule
Copy link

vmasule commented Oct 6, 2021

@wongma7 One thing I forgot to update is that, I fallowed all the step except building cis-proxy.exe from source as it was resulting into error on my Apple M1, so I found the exe here https://k8scsi.blob.core.windows.net/csi-proxy/master/binaries/csi-proxy.tar.gz and installed it on Windows nodes.

Is that version of CSI Proxy may be cause of above error?

@wongma7
Copy link
Contributor

wongma7 commented Oct 6, 2021

yes it could be an issue, versions older than v1.0.0 had some bugs for discovering devices/volumes on EC2 instances. What version is it? If it's possible, check the logs from csi-proxy itself, for example here they https://github.com/kubernetes-csi/csi-proxy#installation emit the logs to "\etc\kubernetes\logs\csi-proxy.log"

@wongma7
Copy link
Contributor

wongma7 commented Oct 6, 2021

BTW, let's try to figure out why the build isn't working in https://github.com/kubernetes-csi/csi-proxy/issues. since the binary isn't being distributed yet kubernetes-csi/csi-proxy#83 build needs to work for everyone (since go build with GOARCh and GOOS should work even if you are on an ARM mac)

@vmasule
Copy link

vmasule commented Oct 6, 2021

@wongma7 Sorry for the confusion, actually CSI-Proxy version is not old, my colleague already compiled and updated latest version. Below are the log from EC2 Instance where Pod was scheduled. It also show one error which related previous error from EBS driver.

PS C:\Windows\system32> Get-Content -Path "C:\csi-logs\csi-proxy.log" -Wait
Log file created at: 2021/10/06 13:25:13
Running on machine: EC2AMAZ-V7FUFK9
Binary: Built with gc go1.17 for windows/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1006 13:25:13.384606    5016 main.go:124] Running as a Windows service.
I1006 13:25:13.412543    5016 main.go:54] Starting CSI-Proxy Server ...
I1006 13:25:13.412543    5016 main.go:55] Version: v1.0.1-2-g23e7f38
I1006 13:25:13.384606    5016 main.go:134] Windows Service initialized through SCM
E1006 14:22:54.692005    5016 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\globalmount).Target, output: , error: <nil>

Any idea what may be wrong here? Pls provide guidance.

@wongma7
Copy link
Contributor

wongma7 commented Oct 6, 2021

Does the driver have any earlier logs? If possible, could you bump its verbosity to at least --v=5 and restart it? https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/templates/node-windows.yaml#L68

This error from csi-proxy "failed GetVolumeIDFromTargetPath: error getting the volume for the mount" is expected under certain conditions. This happens when the driver checks whether something has already been mounted at c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\globalmount, for idempotency. If it gets empty output but no error like output: , error: <nil> then it should proceed to mount something at c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\globalmount. However, the error from the driver that you posted above indicates that there is nothing at c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\globalmount which is why I am wondering if there are more logs to understand how that happened.

@wongma7
Copy link
Contributor

wongma7 commented Oct 6, 2021

OK I have a suspicion this is a bug introduced recently because now we check for existence of mount at the mount point (""c:\var\lib\kubelet\pods\d38a9777-330f-49ea-9316-a27cc295140a\volumes\kubernetes.io~csi\pvc-671cb275-d9b8-4682-84dd-34212d7a6997\mount"") but if that does not exist (and on windows, it is basically guaranteed not to exist since "mount" means creating a symlink) then we treat that as an error instead of ignoring the error. I am now working to reproduce and will get a bugfix out if this is the reason

@vmasule
Copy link

vmasule commented Oct 6, 2021

Thanks @wongma7, for your information I am using below helm chart

Repo: https://kubernetes-sigs.github.io/aws-ebs-csi-driver
Version: 2.3.0
image.repository: amazon/aws-ebs-csi-driver
image.tag: v1.3.0

I hope this help, apart from this I also have seen and just now reproduced below intermittent error

Events:
  Type     Reason                Age                   From                                                                                      Message
  ----     ------                ----                  ----                                                                                      -------
  Warning  ProvisioningFailed    2m50s                 persistentvolume-controller                                                               storageclass.storage.k8s.io "windows" not found
  Normal   Provisioning          41s (x8 over 2m48s)   ebs.csi.aws.com_ebs-csi-controller-6cc677f5fb-km8xv_6c0725a7-b4eb-4ed9-95b3-fda4bf8b47a4  External provisioner is provisioning volume for claim "default/ebs-claim"
  Warning  ProvisioningFailed    41s (x8 over 2m48s)   ebs.csi.aws.com_ebs-csi-controller-6cc677f5fb-km8xv_6c0725a7-b4eb-4ed9-95b3-fda4bf8b47a4  failed to provision volume with StorageClass "windows": error generating accessibility requirements: no topology key found on CSINode ip-178-12-31-50.us-west-2.compute.internal

When I describe the node I can see those topology labels already present on that node.

@wongma7
Copy link
Contributor

wongma7 commented Oct 6, 2021

The node topology label issue sounds similar to #1030 , we have not solved the issue but can continue discussion there.

Thanks for the information, I will comment again whe n I have results of my repro attempt

@wongma7
Copy link
Contributor

wongma7 commented Oct 6, 2021

OK, I reproduced the issue and submitted a fix here #1081. I intend to release v1.3.2 after it merges. Apologies for the trouble, I will comment again when the release is done.

@vmasule
Copy link

vmasule commented Oct 7, 2021

Thank you @wongma7 , I must say speed with which fix provided really surprised me in open source community. Will be watching for release notification.

@vmasule
Copy link

vmasule commented Oct 7, 2021

Hi @wongma7 I used the 2 month old docker image tag(8c6c7e0a590da44c635d629a0653dc73aaa5c9e4) and it worked for attaching the volume but when node terminated it gave me following error when pod is scheduled on another instance in same AZ. Is this issue related to earlier one, and will that be fixed with fix you provide yesterday. Here I am using Dynamic Volume provisioning example.

Events:
  Type     Reason       Age                      From     Message
  ----     ------       ----                     ----     -------
  Warning  FailedMount  14m (x96 over 3h30m)     kubelet  Unable to attach or mount volumes: unmounted volumes=[xpandion-vol-mount], unattached volumes=[xpandion-vol-mount]: timed out waiting for the condition
  Warning  FailedMount  4m20s (x100 over 3h25m)  kubelet  MountVolume.MountDevice failed for volume "pvc-8b2a752d-4938-411f-9586-d629de7642ac" : rpc error: code = Internal desc = could not format "7" and mount it at "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-8b2a752d-4938-411f-9586-d629de7642ac\\globalmount": rpc error: code = Unknown desc = volume id empty

And below is the csi_proxy log on that windows node.

PS C:\Windows\system32> Get-Content -Path "C:\csi-logs\csi-proxy.log" -Wait
Log file created at: 2021/10/07 15:28:06
Running on machine: EC2AMAZ-ROUQ6QB
Binary: Built with gc go1.17 for windows/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1007 15:28:05.986309    7072 main.go:124] Running as a Windows service.
I1007 15:28:05.987606    7072 main.go:134] Windows Service initialized through SCM
I1007 15:28:06.027997    7072 main.go:54] Starting CSI-Proxy Server ...
I1007 15:28:06.027997    7072 main.go:55] Version: v1.0.1-12-gebdbe95
E1007 15:34:56.893120    7072 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-8b2a752d-4938-411f-9586-d629de7642ac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-8b2a752d-4938-411f-9586-d629de7642ac\globalmount).Target, output: , error: <nil>
E1007 15:35:03.923490    7072 server.go:102] volume id empty
E1007 15:35:06.840241    7072 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-8b2a752d-4938-411f-9586-d629de7642ac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-8b2a752d-4938-411f-9586-d629de7642ac\globalmount).Target, output: , error: <nil>
E1007 15:35:13.075409    7072 server.go:102] volume id empty
E1007 15:35:16.617989    7072 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-8b2a752d-4938-411f-9586-d629de7642ac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-8b2a752d-4938-411f-9586-d629de7642ac\globalmount).Target, output: , error: <nil>
E1007 15:35:23.192473    7072 server.go:102] volume id empty
E1007 15:35:27.714278    7072 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-8b2a752d-4938-411f-9586-d629de7642ac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-8b2a752d-4938-411f-9586-d629de7642ac\globalmount).Target, output: , error: <nil>
E1007 15:35:33.806118    7072 server.go:102] volume id empty

@wongma7
Copy link
Contributor

wongma7 commented Oct 7, 2021

it looks like a different issue, would it be possible to retrieve what the partitions on that volume look like? For example, if you can access powershell on that node the csi-proxy logs are from

Get-Disk -Number 7 | Get-Partition
PartitionNumber DriveLetter Offset Size Type


1 17408 15.98 MB Reserved
2 16777216 3.98 GB Basic

the workflow is slightly different between the first time the volume is used and second because for the first time the driver must format the volume but for the second time the driver must discover the partitions on the volume. So it is possible there is another bug in the discovery depending on how your disk looks.

@vmasule
Copy link

vmasule commented Oct 7, 2021

Actually, I terminated the node while ago but will reproduce it tomorrow and update here.

@vmasule
Copy link

vmasule commented Oct 8, 2021

Hi @wongma7, I reproduced what I promised yesterday, this what it looks like.

E1008 19:17:46.438765    3832 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-4e8fe8a4-7e1a-453d-9eb6-90607d490f23\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-4e8fe8a4-7e1a-453d-9eb6-90607d490f23\globalmount).Target, output: , error: <nil>
E1008 19:19:02.429445    3832 server.go:102] volume id empty
E1008 19:21:28.457506    3832 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-4e8fe8a4-7e1a-453d-9eb6-90607d490f23\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-4e8fe8a4-7e1a-453d-9eb6-90607d490f23\globalmount).Target, output: , error: <nil>
E1008 19:22:51.321570    3832 server.go:102] volume id empty
E1008 19:25:08.527540    3832 server.go:121] ListDiskIDs failed: The system cannot find the file specified.
E1008 19:27:36.302289    3832 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-4e8fe8a4-7e1a-453d-9eb6-90607d490f23\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-4e8fe8a4-7e1a-453d-9eb6-90607d490f23\globalmount).Target, output: , error: <nil>
E1008 19:29:03.477955    3832 server.go:102] volume id empty
E1008 19:31:13.038994    3832 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-4e8fe8a4-7e1a-453d-9eb6-90607d490f23\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-4e8fe8a4-7e1a-453d-9eb6-90607d490f23\globalmount).Target, output: , error: <nil>
PS C:\Windows\system32> ^C
PS C:\Windows\system32> Get-Disk -Number 7 | Get-Partition
PS C:\Windows\system32>
PS C:\Windows\system32> Get-Disk -Number 8 | Get-Partition


   DiskPath: \\?\scsi#disk&ven_msft&prod_virtual_disk#2&1f4adffe&0&000008#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}

PartitionNumber  DriveLetter Offset                                                                                                                                           Size Type
---------------  ----------- ------                                                                                                                                           ---- ----
1                            1048576                                                                                                                                        128 MB Reserved
2                            135266304                                                                                                                                    19.87 GB Basic


PS C:\Windows\system32>

@wongma7
Copy link
Contributor

wongma7 commented Oct 8, 2021

@vmasule do you have access to corresponding logs from the csi driver on that node?

@vmasule
Copy link

vmasule commented Oct 8, 2021

I think it was same what I shared yesterday, except for disk number, which is instead 7 it is 8 now.

@wongma7
Copy link
Contributor

wongma7 commented Oct 8, 2021

OK, I thought there are 2 different errors so I would expect the logs to be different, but we can focus on the original one because that is from the latest version of the driver

error 1 from driver v1.3.0 "file does not exist"

  Warning  FailedMount             112s (x13 over 12m)  kubelet                  MountVolume.SetUp failed for volume "pvc-462e8ac4-5ce3-4fde-9398-d0b799d1b128" : rpc error: code = Internal desc = Could not mount "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-462e8ac4-5ce3-4fde-9398-d0b799d1b128\\globalmount" at "c:\\var\\lib\\kubelet\\pods\\04a8aea3-fa7a-4165-8260-374a0269e2d6\\volumes\\kubernetes.io~csi\\pvc-462e8ac4-5ce3-4fde-9398-d0b799d1b128\\mount": file does not exist

for this error, I think it is fixed by #1081. If you are brave you may try the unstable/development version of the driver containing this fix at gcr.io/k8s-staging-provider-aws/aws-ebs-csi-driver:v20211007-helm-chart-aws-ebs-csi-driver-2.3.0-12-g4d5d7e7f, otherwise I plan to release the fix in v1.3.2 and will update this issue when it releases.

error 2 from driver 8c6c7e0 "volume id empty"

 Warning  FailedMount  4m20s (x100 over 3h25m)  kubelet  MountVolume.MountDevice failed for volume "pvc-8b2a752d-4938-411f-9586-d629de7642ac" : rpc error: code = Internal desc = could not format "7" and mount it at "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-8b2a752d-4938-411f-9586-d629de7642ac\\globalmount": rpc error: code = Unknown desc = volume id empty

for this error, I may need CSI node logs to debug further, as the volume 8 looks fine to me from the above output so I am not sure how come the driver was not able to find it.

@vmasule
Copy link

vmasule commented Oct 11, 2021

@wongma7 , I can confirm the Error 1 above is now fixed, I used the public docker repo instead of gcr.io. this is the https://hub.docker.com/r/amazon/aws-ebs-csi-driver/tags?page=1&name=4d5d7e7fea2b4b1d178f43c79957c2bebc2b0721 tag I used which was built day you pushed changes on master.

But...

I have reproduced the Error 2 again and below are the logs.

Events from Application Pod

Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Normal   Scheduled               9m23s                  default-scheduler        Successfully assigned default/windows-server-iis-7c5fc8f6c5-h6k7t to ip-172-12-27-150.us-west-2.compute.internal
  Warning  FailedAttachVolume      9m23s                  attachdetach-controller  Multi-Attach error for volume "pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac" Volume is already exclusively attached to one node and can't be attached to another
  Normal   SuccessfulAttachVolume  3m15s                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac"
  Warning  FailedMount             2m18s (x3 over 7m19s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[xpandion-vol-mount], unattached volumes=[xpandion-vol-mount]: timed out waiting for the condition
  Warning  FailedMount             53s (x3 over 2m16s)    kubelet                  MountVolume.MountDevice failed for volume "pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac" : rpc error: code = Internal desc = could not format "1" and mount it at "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\\globalmount": rpc error: code = Unknown desc = volume id empty

Logs and partition details from CSI Node

PS C:\Windows\system32> Get-Content -Path "C:\csi-logs\csi-proxy.log" -Wait
Log file created at: 2021/10/08 12:29:57
Running on machine: EC2AMAZ-430EL3Q
Binary: Built with gc go1.17 for windows/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1008 12:29:57.456986    5964 main.go:124] Running as a Windows service.
I1008 12:29:57.489680    5964 main.go:54] Starting CSI-Proxy Server ...
I1008 12:29:57.489680    5964 main.go:55] Version: v1.0.1-12-gebdbe95
I1008 12:29:57.457986    5964 main.go:134] Windows Service initialized through SCM
E1011 16:55:15.312005    5964 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount).Target, output: , error: <nil>
E1011 16:55:23.257404    5964 server.go:102] volume id empty
E1011 16:55:27.533375    5964 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount).Target, output: , error: <nil>
E1011 16:55:36.835518    5964 server.go:102] volume id empty
E1011 16:55:41.464118    5964 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount).Target, output: , error: <nil>
E1011 16:56:46.493897    5964 server.go:102] volume id empty
E1011 16:57:25.287261    5964 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount).Target, output: , error: <nil>
E1011 16:58:56.548325    5964 server.go:102] volume id empty
E1011 16:59:48.920536    5964 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount).Target, output: , error: <nil>
E1011 17:00:15.145688    5964 server.go:102] volume id empty
E1011 17:00:35.049874    5964 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount).Target, output: , error: <nil>
E1011 17:00:42.966245    5964 server.go:102] volume id empty
E1011 17:01:51.511164    5964 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount).Target, output: , error: <nil>
E1011 17:03:34.006764    5964 server.go:102] volume id empty
E1011 17:04:53.555044    5964 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount).Target, output: , error: <nil>
E1011 17:05:12.960786    5964 server.go:102] volume id empty
E1011 17:07:52.958427    5964 server.go:283] failed GetVolumeIDFromTargetPath: error getting the volume for the mount c:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount, internal error error getting volume from mount. cmd: (Get-Item -Path c:\var\lib\kubelet\plugins\ku
bernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount).Target, output: , error: <nil>
E1011 17:09:19.112653    5964 server.go:102] volume id empty

PS C:\Windows\system32> Get-Disk -Number 1 | Get-Partition


   DiskPath: \\?\scsi#disk&ven_aws&prod_pvdisk#003400#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}

PartitionNumber  DriveLetter Offset                                                                                                                                           Size Type
---------------  ----------- ------                                                                                                                                           ---- ----
1                            17408                                                                                                                                        15.98 MB Reserved
2                            16777216                                                                                                                                     49.98 GB Basic

[Edit] Logs from CSI Driver on windows node(after changing log level to 5 I reproduced the issue and below are the logs for that, ignore disk partition number 5 as pod is launch on another node but issue is same)

sh-4.2$ k logs -n kube-system ebs-csi-node-windows-q2f5s
error: a container name must be specified for pod ebs-csi-node-windows-q2f5s, choose one of: [ebs-plugin node-driver-registrar liveness-probe]
sh-4.2$ k logs -n kube-system ebs-csi-node-windows-q2f5s ebs-plugin
I1011 17:43:08.792232    6584 driver.go:72] Driver: ebs.csi.aws.com Version: v1.3.0
I1011 17:43:09.028038    6584 node.go:85] [Debug] Retrieving node info from metadata service
I1011 17:43:09.028038    6584 metadata.go:67] retrieving instance data from ec2 metadata
I1011 17:43:12.253152    6584 metadata.go:74] ec2 metadata is available
I1011 17:43:12.277647    6584 driver.go:142] Listening for connections on address: &net.UnixAddr{Name:"\\csi\\csi.sock", Net:"unix"}
I1011 17:43:13.661474    6584 node.go:509] NodeGetInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1011 17:51:27.812549    6584 node.go:493] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1011 17:51:28.597336    6584 node.go:105] NodeStageVolume: called with args {VolumeId:vol-017dd582b51e036b0 PublishContext:map[devicePath:/dev/xvdba] StagingTargetPath:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount VolumeCapability:mount:<fs_type:"ntfs" > access_mode:<mode:SINGLE_NODE_WRITER >  Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1633695836145-8081-ebs.csi.aws.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1011 17:52:26.865821    6584 node.go:174] NodeStageVolume: find device path /dev/xvdba -> 5
I1011 17:52:27.734373    6584 node.go:208] NodeStageVolume: formatting 5 and mounting at \var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount with fstype ntfs
I1011 17:52:35.002127    6584 node.go:152] NodeStageVolume: volume="vol-017dd582b51e036b0" operation finished
I1011 17:52:35.002127    6584 inflight.go:73] Node Service: volume="vol-017dd582b51e036b0" operation finished
E1011 17:52:35.002127    6584 driver.go:119] GRPC error: rpc error: code = Internal desc = could not format "5" and mount it at "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\\globalmount": rpc error: code = Unknown desc = volume id empty
I1011 17:52:35.602748    6584 node.go:493] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1011 17:52:35.610449    6584 node.go:105] NodeStageVolume: called with args {VolumeId:vol-017dd582b51e036b0 PublishContext:map[devicePath:/dev/xvdba] StagingTargetPath:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount VolumeCapability:mount:<fs_type:"ntfs" > access_mode:<mode:SINGLE_NODE_WRITER >  Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1633695836145-8081-ebs.csi.aws.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1011 17:52:37.450352    6584 node.go:174] NodeStageVolume: find device path /dev/xvdba -> 5
I1011 17:52:37.746182    6584 node.go:208] NodeStageVolume: formatting 5 and mounting at \var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount with fstype ntfs
I1011 17:52:43.468643    6584 node.go:152] NodeStageVolume: volume="vol-017dd582b51e036b0" operation finished
I1011 17:52:43.468643    6584 inflight.go:73] Node Service: volume="vol-017dd582b51e036b0" operation finished
E1011 17:52:43.468643    6584 driver.go:119] GRPC error: rpc error: code = Internal desc = could not format "5" and mount it at "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\\globalmount": rpc error: code = Unknown desc = volume id empty
I1011 17:52:44.492837    6584 node.go:493] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1011 17:52:44.501555    6584 node.go:105] NodeStageVolume: called with args {VolumeId:vol-017dd582b51e036b0 PublishContext:map[devicePath:/dev/xvdba] StagingTargetPath:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount VolumeCapability:mount:<fs_type:"ntfs" > access_mode:<mode:SINGLE_NODE_WRITER >  Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1633695836145-8081-ebs.csi.aws.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1011 17:52:46.338205    6584 node.go:174] NodeStageVolume: find device path /dev/xvdba -> 5
I1011 17:52:46.615694    6584 node.go:208] NodeStageVolume: formatting 5 and mounting at \var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount with fstype ntfs
I1011 17:52:52.223938    6584 node.go:152] NodeStageVolume: volume="vol-017dd582b51e036b0" operation finished
I1011 17:52:52.223938    6584 inflight.go:73] Node Service: volume="vol-017dd582b51e036b0" operation finished
E1011 17:52:52.223938    6584 driver.go:119] GRPC error: rpc error: code = Internal desc = could not format "5" and mount it at "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\\globalmount": rpc error: code = Unknown desc = volume id empty
I1011 17:52:54.340713    6584 node.go:493] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1011 17:52:54.350696    6584 node.go:105] NodeStageVolume: called with args {VolumeId:vol-017dd582b51e036b0 PublishContext:map[devicePath:/dev/xvdba] StagingTargetPath:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1c0ecde3-cb0d-49a6-8e68-156f1738ceac\globalmount VolumeCapability:mount:<fs_type:"ntfs" > access_mode:<mode:SINGLE_NODE_WRITER >  Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1633695836145-8081-ebs.csi.aws.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}

How to reproduce?
After launching Windows Application pod with Dynamic Volume provisioning, terminate the EC2 instance where pod is scheduled, after some time pod will be scheduled on another node in same AZ due to nodes AZ affinity, but this pod gets stuck in ContainerCreating stage with above error.

Let me know if you need anything else. This is really failing our Single AZ SLA for the application. Your quick opinion on this will be highly appreciated. Thanks.

@wongma7
Copy link
Contributor

wongma7 commented Oct 12, 2021

@vmasule thank you for the logs , unfortunately I couldn't reproduce the issue, for reference I am using this AMI . amazon/Windows_Server-2019-English-Full-EKS_Optimized-1.20-2021.09.16. Are you draining the node before you terminate the EC2 instance? Also, if possible could you get the output of the command like this:

PS C:\Users\Administrator> Get-Disk -Number 2 | Get-Partition | Get-Volume

DriveLetter FriendlyName FileSystemType DriveType HealthStatus OperationalStatus SizeRemaining     Size
----------- ------------ -------------- --------- ------------ ----------------- -------------     ----
                         NTFS           Fixed     Healthy      OK                     19.75 GB 19.87 GB

it is erroring trying to get the volumes here https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/mounter/safe_mounter_windows.go#L263

BTW, I will close this issue and open another since we'll release v1.4.0 to address error 1 while we can discuss error 2 in its own issue #1090

@vmasule
Copy link

vmasule commented Oct 12, 2021

@wongma7, I am definitely not draining the node before termination, I had terminated the node manually from AWS Console, and its quite possible node can get terminated any time without much of time left for draining the node in certain scenarios.

Sure you can close this ticket, when should we expect the release for Error 1?

I will provide the logs tomorrow for error 2, and will post it in new ticket.

Thanks for your help.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 10, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 9, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants