After installing CSI driver using helm with --set windows.enabled=true, the connection to the ingress controller becomes sporadic #267

lesio999 · 2021-05-18T15:44:37Z

What happened:
Main issue: after installing CSI driver using helm with --set windows.enabled=true, the connection to the ingress controller becomes sporadic. (one request succeeds and the next fails, the third succeeds, the fourth fails. , we used four different types methods to test connection)

What you expected to happen:
Expecting to have access to my services via ingress unaffected

How to reproduce it:

new AKS cluster (1.20.5) with both types of nodes (linux and windows) and running ingress (https://docs.microsoft.com/en-us/azure/aks/ingress-basic)
execute few times test-netconnection and ingress demo app - should be no issue
install csi driver:
helm repo add csi-driver-smb https://raw.githubusercontent.com/kubernetes-csi/csi-driver-smb/master/charts
helm install csi-driver-smb csi-driver-smb/csi-driver-smb --namespace kube-system --set windows.enabled=true
repeat tests from test: fix travis config #2 - some will be ok, some will fail with connection refused error.

Anything else we need to know?:
installing csi driver on linux nodes only - all works just fine

Environment:

CSI Driver version: latest
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"54684493f8139456e5d2f963b23cb5003c4d8055", GitTreeState:"clean", BuildDate:"2021-03-22T23:02:59Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

andyzhangx · 2021-05-18T15:51:54Z

When install windows driver, it would create a daemonset and expose following two ports on the windows agent node, I think that's the only difference when you set --set windows.enabled=true

csi-driver-smb/deploy/csi-smb-node-windows.yaml

Lines 79 to 85 in 932dc53

    
           ports: 
        
             - containerPort: 29643 
        
               name: healthz 
        
               protocol: TCP 
        
             - containerPort: 29645 
        
               name: metrics 
        
               protocol: TCP

lesio999 · 2021-05-18T16:01:19Z

We tried the other day with overridden ports --set controller.metricsPort=39644 --set node.metricsPort=39645 as well - still not good. Where do you think we may have that port collision? What ports should be open in the linux pod - I see 29643 and 29645.

andyzhangx · 2021-05-22T01:17:02Z

since it's only related to Windows daemonset, would you try replace with other ports(or remove those ports directly) in csi-smb-node-windows.yaml, and then kubectl apply -f csi-smb-node-windows.yaml

csi-driver-smb/deploy/csi-smb-node-windows.yaml

Lines 79 to 85 in a03a482

    
           ports: 
        
             - containerPort: 29643 
        
               name: healthz 
        
               protocol: TCP 
        
             - containerPort: 29645 
        
               name: metrics 
        
               protocol: TCP

lesio999 · 2021-06-08T19:12:12Z

I would like to add: this issue looks like is exposed when applied for linux nodes only. I know the title at this point is misleading. Is there anything we can do to troubleshot that? BTW: I installed v1.0.0, no difference, still "locking" our services via ingress.

andyzhangx · 2021-06-09T02:25:37Z

would you try this config, I removed one containerPort and changed another containerPort num:

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: csi-smb-node-win
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: csi-smb-node-win
  template:
    metadata:
      labels:
        app: csi-smb-node-win
    spec:
      tolerations:
        - key: "node.kubernetes.io/os"
          operator: "Exists"
          effect: "NoSchedule"
      nodeSelector:
        kubernetes.io/os: windows
      priorityClassName: system-node-critical
      containers:
        - name: liveness-probe
          volumeMounts:
            - mountPath: C:\csi
              name: plugin-dir
          image: mcr.microsoft.com/oss/kubernetes-csi/livenessprobe:v2.3.0
          args:
            - --csi-address=$(CSI_ENDPOINT)
            - --probe-timeout=3s
            - --health-port=39643
            - --v=2
          env:
            - name: CSI_ENDPOINT
              value: unix://C:\\csi\\csi.sock
          resources:
            limits:
              cpu: 200m
              memory: 200Mi
            requests:
              cpu: 10m
              memory: 20Mi
        - name: node-driver-registrar
          image: mcr.microsoft.com/oss/kubernetes-csi/csi-node-driver-registrar:v2.2.0
          args:
            - --v=2
            - --csi-address=$(CSI_ENDPOINT)
            - --kubelet-registration-path=C:\\var\\lib\\kubelet\\plugins\\smb.csi.k8s.io\\csi.sock
          env:
            - name: CSI_ENDPOINT
              value: unix://C:\\csi\\csi.sock
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: kubelet-dir
              mountPath: "C:\\var\\lib\\kubelet"
            - name: plugin-dir
              mountPath: C:\csi
            - name: registration-dir
              mountPath: C:\registration
          resources:
            limits:
              cpu: 200m
              memory: 200Mi
            requests:
              cpu: 10m
              memory: 20Mi
        - name: smb
          image: mcr.microsoft.com/k8s/csi/smb-csi:latest
          imagePullPolicy: IfNotPresent
          args:
            - --v=5
            - --endpoint=$(CSI_ENDPOINT)
            - --nodeid=$(KUBE_NODE_NAME)
          ports:
            - containerPort: 39643
              name: healthz
              protocol: TCP
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthz
              port: healthz
            initialDelaySeconds: 30
            timeoutSeconds: 10
            periodSeconds: 30
          env:
            - name: CSI_ENDPOINT
              value: unix://C:\\csi\\csi.sock
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: kubelet-dir
              mountPath: "C:\\var\\lib\\kubelet"
            - name: plugin-dir
              mountPath: C:\csi
            - name: csi-proxy-fs-pipe
              mountPath: \\.\pipe\csi-proxy-filesystem-v1beta1
            - name: csi-proxy-smb-pipe
              mountPath: \\.\pipe\csi-proxy-smb-v1beta1
          resources:
            limits:
              cpu: 400m
              memory: 400Mi
            requests:
              cpu: 10m
              memory: 20Mi
      volumes:
        - name: csi-proxy-fs-pipe
          hostPath:
            path: \\.\pipe\csi-proxy-filesystem-v1beta1
            type: ""
        - name: csi-proxy-smb-pipe
          hostPath:
            path: \\.\pipe\csi-proxy-smb-v1beta1
            type: ""
        - name: registration-dir
          hostPath:
            path: C:\var\lib\kubelet\plugins_registry\
            type: Directory
        - name: kubelet-dir
          hostPath:
            path: C:\var\lib\kubelet\
            type: Directory
        - name: plugin-dir
          hostPath:
            path: C:\var\lib\kubelet\plugins\smb.csi.k8s.io\
            type: DirectoryOrCreate

k8s-triage-robot · 2021-09-07T03:05:52Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

andyzhangx · 2021-09-07T03:13:18Z

@lesio999 is the problem solved?

lesio999 · 2021-09-08T09:53:09Z

Andy, Still do not know. We did not move with upgrade as we wanted and plans are to go to 1.21 or even 1.22 in this Q. we will let you know when starting testing that/those version again. Leszek Lewicki Application Architect Sr O +1 908 994 3296 ***@***.******@***.***> Cognizant.com<http://www.cognizant.com/> From: Andy Zhang ***@***.***> Sent: Monday, September 6, 2021 11:14 PM To: kubernetes-csi/csi-driver-smb ***@***.***> Cc: Lewicki, Leszek (Cognizant) ***@***.***>; Mention ***@***.***> Subject: Re: [kubernetes-csi/csi-driver-smb] After installing CSI driver using helm with --set windows.enabled=true, the connection to the ingress controller becomes sporadic (#267) [External] @lesio999<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flesio999&data=04%7C01%7Cleszek.lewicki%40cognizant.com%7C440c13df9b4e450d6f4808d971ad7954%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637665812213810110%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vkDbRgpLNY1SXVW%2BvYmyDgAmizQ3hsqLMc3C3iwxZg0%3D&reserved=0> is the problem solved? - You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkubernetes-csi%2Fcsi-driver-smb%2Fissues%2F267%23issuecomment-913958411&data=04%7C01%7Cleszek.lewicki%40cognizant.com%7C440c13df9b4e450d6f4808d971ad7954%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637665812213820065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QV25uhSjywFQRW8vqgQH%2B0Aa675UX50rr4gUL%2BwX%2BcA%3D&reserved=0>, or unsubscribe<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUD5NHYMPT74VF2AZF2YRJ3UAV7NVANCNFSM45CZKJUQ&data=04%7C01%7Cleszek.lewicki%40cognizant.com%7C440c13df9b4e450d6f4808d971ad7954%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637665812213825041%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=eKGdqEQHtZ2U%2F0jb7cB9fW6DCbtmQXJLv5IK2W3%2BOiI%3D&reserved=0>. Triage notifications on the go with GitHub Mobile for iOS<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cleszek.lewicki%40cognizant.com%7C440c13df9b4e450d6f4808d971ad7954%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637665812213825041%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xMFUYEFOnahy8KBFlXrcpCrwMw0T1WZOIQUeAbL5Smg%3D&reserved=0> or Android<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cleszek.lewicki%40cognizant.com%7C440c13df9b4e450d6f4808d971ad7954%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637665812213830021%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bITEPQg74D1SUJwfIDZnR3C2NicB4fAkCIgTHiql1V4%3D&reserved=0>. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

k8s-triage-robot · 2021-10-08T10:10:51Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-11-07T10:22:37Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-11-07T10:22:39Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

049659326 Merge pull request kubernetes-csi#268 from huww98/cloudbuild 119aee1ff Merge pull request kubernetes-csi#266 from jsafrane/bump-sanity-5.3.1 0ae5e52d9 Update cloudbuild image with go 1.21+ 406a79acf Merge pull request kubernetes-csi#267 from huww98/gomodcache 9cec273d8 Set GOMODCACHE to avoid re-download toolchain 43bde065f Bump csi-sanity to 5.3.1 git-subtree-dir: release-tools git-subtree-split: 04965932661b6e62709dcdbb9c25da528bac2605

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 8, 2021

k8s-ci-robot closed this as completed Nov 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After installing CSI driver using helm with --set windows.enabled=true, the connection to the ingress controller becomes sporadic #267

After installing CSI driver using helm with --set windows.enabled=true, the connection to the ingress controller becomes sporadic #267

lesio999 commented May 18, 2021

andyzhangx commented May 18, 2021

lesio999 commented May 18, 2021 via email •

edited

Loading

andyzhangx commented May 22, 2021

lesio999 commented Jun 8, 2021

andyzhangx commented Jun 9, 2021

k8s-triage-robot commented Sep 7, 2021

andyzhangx commented Sep 7, 2021

lesio999 commented Sep 8, 2021 via email

k8s-triage-robot commented Oct 8, 2021

k8s-triage-robot commented Nov 7, 2021

k8s-ci-robot commented Nov 7, 2021

After installing CSI driver using helm with --set windows.enabled=true, the connection to the ingress controller becomes sporadic #267

After installing CSI driver using helm with --set windows.enabled=true, the connection to the ingress controller becomes sporadic #267

Comments

lesio999 commented May 18, 2021

andyzhangx commented May 18, 2021

lesio999 commented May 18, 2021 via email • edited Loading

andyzhangx commented May 22, 2021

lesio999 commented Jun 8, 2021

andyzhangx commented Jun 9, 2021

k8s-triage-robot commented Sep 7, 2021

andyzhangx commented Sep 7, 2021

lesio999 commented Sep 8, 2021 via email

k8s-triage-robot commented Oct 8, 2021

k8s-triage-robot commented Nov 7, 2021

k8s-ci-robot commented Nov 7, 2021

lesio999 commented May 18, 2021 via email •

edited

Loading