Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After installing CSI driver using helm with --set windows.enabled=true, the connection to the ingress controller becomes sporadic #267

Closed
lesio999 opened this issue May 18, 2021 · 11 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@lesio999
Copy link

What happened:
Main issue: after installing CSI driver using helm with --set windows.enabled=true, the connection to the ingress controller becomes sporadic. (one request succeeds and the next fails, the third succeeds, the fourth fails. , we used four different types methods to test connection)

What you expected to happen:
Expecting to have access to my services via ingress unaffected

How to reproduce it:

  1. new AKS cluster (1.20.5) with both types of nodes (linux and windows) and running ingress (https://docs.microsoft.com/en-us/azure/aks/ingress-basic)
  2. execute few times test-netconnection and ingress demo app - should be no issue
  3. install csi driver:
    helm repo add csi-driver-smb https://raw.githubusercontent.com/kubernetes-csi/csi-driver-smb/master/charts
    helm install csi-driver-smb csi-driver-smb/csi-driver-smb --namespace kube-system --set windows.enabled=true
  4. repeat tests from test: fix travis config #2 - some will be ok, some will fail with connection refused error.

Anything else we need to know?:
installing csi driver on linux nodes only - all works just fine

Environment:

  • CSI Driver version: latest
  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"windows/amd64"}
    Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"54684493f8139456e5d2f963b23cb5003c4d8055", GitTreeState:"clean", BuildDate:"2021-03-22T23:02:59Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@andyzhangx
Copy link
Member

When install windows driver, it would create a daemonset and expose following two ports on the windows agent node, I think that's the only difference when you set --set windows.enabled=true

ports:
- containerPort: 29643
name: healthz
protocol: TCP
- containerPort: 29645
name: metrics
protocol: TCP

@lesio999
Copy link
Author

lesio999 commented May 18, 2021 via email

@andyzhangx
Copy link
Member

since it's only related to Windows daemonset, would you try replace with other ports(or remove those ports directly) in csi-smb-node-windows.yaml, and then kubectl apply -f csi-smb-node-windows.yaml

ports:
- containerPort: 29643
name: healthz
protocol: TCP
- containerPort: 29645
name: metrics
protocol: TCP

@lesio999
Copy link
Author

lesio999 commented Jun 8, 2021

I would like to add: this issue looks like is exposed when applied for linux nodes only. I know the title at this point is misleading. Is there anything we can do to troubleshot that? BTW: I installed v1.0.0, no difference, still "locking" our services via ingress.

@andyzhangx
Copy link
Member

would you try this config, I removed one containerPort and changed another containerPort num:

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: csi-smb-node-win
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: csi-smb-node-win
  template:
    metadata:
      labels:
        app: csi-smb-node-win
    spec:
      tolerations:
        - key: "node.kubernetes.io/os"
          operator: "Exists"
          effect: "NoSchedule"
      nodeSelector:
        kubernetes.io/os: windows
      priorityClassName: system-node-critical
      containers:
        - name: liveness-probe
          volumeMounts:
            - mountPath: C:\csi
              name: plugin-dir
          image: mcr.microsoft.com/oss/kubernetes-csi/livenessprobe:v2.3.0
          args:
            - --csi-address=$(CSI_ENDPOINT)
            - --probe-timeout=3s
            - --health-port=39643
            - --v=2
          env:
            - name: CSI_ENDPOINT
              value: unix://C:\\csi\\csi.sock
          resources:
            limits:
              cpu: 200m
              memory: 200Mi
            requests:
              cpu: 10m
              memory: 20Mi
        - name: node-driver-registrar
          image: mcr.microsoft.com/oss/kubernetes-csi/csi-node-driver-registrar:v2.2.0
          args:
            - --v=2
            - --csi-address=$(CSI_ENDPOINT)
            - --kubelet-registration-path=C:\\var\\lib\\kubelet\\plugins\\smb.csi.k8s.io\\csi.sock
          env:
            - name: CSI_ENDPOINT
              value: unix://C:\\csi\\csi.sock
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: kubelet-dir
              mountPath: "C:\\var\\lib\\kubelet"
            - name: plugin-dir
              mountPath: C:\csi
            - name: registration-dir
              mountPath: C:\registration
          resources:
            limits:
              cpu: 200m
              memory: 200Mi
            requests:
              cpu: 10m
              memory: 20Mi
        - name: smb
          image: mcr.microsoft.com/k8s/csi/smb-csi:latest
          imagePullPolicy: IfNotPresent
          args:
            - --v=5
            - --endpoint=$(CSI_ENDPOINT)
            - --nodeid=$(KUBE_NODE_NAME)
          ports:
            - containerPort: 39643
              name: healthz
              protocol: TCP
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthz
              port: healthz
            initialDelaySeconds: 30
            timeoutSeconds: 10
            periodSeconds: 30
          env:
            - name: CSI_ENDPOINT
              value: unix://C:\\csi\\csi.sock
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: kubelet-dir
              mountPath: "C:\\var\\lib\\kubelet"
            - name: plugin-dir
              mountPath: C:\csi
            - name: csi-proxy-fs-pipe
              mountPath: \\.\pipe\csi-proxy-filesystem-v1beta1
            - name: csi-proxy-smb-pipe
              mountPath: \\.\pipe\csi-proxy-smb-v1beta1
          resources:
            limits:
              cpu: 400m
              memory: 400Mi
            requests:
              cpu: 10m
              memory: 20Mi
      volumes:
        - name: csi-proxy-fs-pipe
          hostPath:
            path: \\.\pipe\csi-proxy-filesystem-v1beta1
            type: ""
        - name: csi-proxy-smb-pipe
          hostPath:
            path: \\.\pipe\csi-proxy-smb-v1beta1
            type: ""
        - name: registration-dir
          hostPath:
            path: C:\var\lib\kubelet\plugins_registry\
            type: Directory
        - name: kubelet-dir
          hostPath:
            path: C:\var\lib\kubelet\
            type: Directory
        - name: plugin-dir
          hostPath:
            path: C:\var\lib\kubelet\plugins\smb.csi.k8s.io\
            type: DirectoryOrCreate

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2021
@andyzhangx
Copy link
Member

@lesio999 is the problem solved?

@lesio999
Copy link
Author

lesio999 commented Sep 8, 2021 via email

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 8, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

andyzhangx added a commit to andyzhangx/csi-driver-smb that referenced this issue Dec 23, 2024
049659326 Merge pull request kubernetes-csi#268 from huww98/cloudbuild
119aee1ff Merge pull request kubernetes-csi#266 from jsafrane/bump-sanity-5.3.1
0ae5e52d9 Update cloudbuild image with go 1.21+
406a79acf Merge pull request kubernetes-csi#267 from huww98/gomodcache
9cec273d8 Set GOMODCACHE to avoid re-download toolchain
43bde065f Bump csi-sanity to 5.3.1

git-subtree-dir: release-tools
git-subtree-split: 04965932661b6e62709dcdbb9c25da528bac2605
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants