Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with local_volume_provisioner DaemonSet #5389

Closed
irizzant opened this issue Nov 27, 2019 · 7 comments · Fixed by #6319
Closed

Problem with local_volume_provisioner DaemonSet #5389

irizzant opened this issue Nov 27, 2019 · 7 comments · Fixed by #6319
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@irizzant
Copy link
Contributor

irizzant commented Nov 27, 2019

Hello

I was trying to enable local volume provisioning for my LVM provided block devices in the worker nodes.

After I configured addons.yaml like this:

# Local volume provisioner deployment
local_volume_provisioner_enabled: true
local_volume_provisioner_namespace: kube-system
local_volume_provisioner_storage_classes:
  local-storage:
    host_dir: /mnt/disks
    mount_dir: /mnt/disks
    volume_mode: Block
    blockCleanerCommand:
         - "/scripts/shred.sh"
         - "2"
#     fs_type: ext4
#   fast-disks:
#     host_dir: /mnt/fast-disks
#     mount_dir: /mnt/fast-disks
#     block_cleaner_command:
#       - "/scripts/shred.sh"
#       - "2"
#     volume_mode: Filesystem
#     fs_type: ext4

I ran the follwing:

ansible-playbook upgrade-cluster.yml -v -b -i inventory/prod/hosts.yaml --become --become-user=root

and I didn't get any provisioner's pod running.

I then tried this:

ansible-playbook upgrade-cluster.yml -v -b -i inventory/prod/hosts.yaml --become --become-user=root --tags=master

and this time Kubespray created the provisioner's pods correctly but they didn't provide any PV.
After inspecting the logs I found:

E1127 13:29:36.847027       1 discovery.go:218] Directory check for "/mnt/disks/lvm-pv-uuid-jwtdOD-4QzH-yv10-455a-qyDp-30k5-SMcmbh" failed: open /mnt/disks/lvm-pv-uuid-jwtdOD-4QzH-yv10-455a-qyDp-30k5-SMcmbh: no such file or directory

After further investigation I found the problem is in the DaemonSet.

Following the offical docs I created a symbolic link in /mnt/disk folder for my devices, but the aforementioned DaemonSet only mounts the /mnt/disk folder in the container which in turns links to a folder in /dev/ which is NOT mounted in the container too.

I found the solution in the local volume provisioner helm chart.

As you can see there is an additional volume mount:

- name: provisioner-dev
    hostPath:
      path: /dev

If I change the DaemonSet to add the /dev mount the local volume provisioner bootstraps without any error.

Environment:

  • Cloud provider or hardware configuration: Baremetal

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"): Ubuntu 18.04 LTS

  • Version of Ansible (ansible --version):
    ansible 2.7.12
    config file = /home/kubespray/kubespray/ansible.cfg
    configured module search path = ['/home/kubespray/kubespray/library']
    ansible python module location = /usr/local/lib/python3.6/dist-packages/ansible
    executable location = /usr/local/bin/ansible
    python version = 3.6.8 (default, Oct 7 2019, 12:59:55) [GCC 8.3.0]

Kubespray version (commit) (git rev-parse --short HEAD): f3c072f

Network plugin used: Calico

Copy of your inventory file:

all:
  hosts:
    sdbfi-k8s-master1:
      ansible_host: 10.101.1.172
      ip: 10.101.1.172
      access_ip: 10.101.1.172
    sdbfi-k8s-master2:
      ansible_host: 10.101.1.173
      ip: 10.101.1.173
      access_ip: 10.101.1.173
    sdbfi-k8s-master3:
      ansible_host: 10.101.1.174
      ip: 10.101.1.174
      access_ip: 10.101.1.174
    sdbfi-k8s-worker1:
      ansible_host: 10.101.1.175
      ip: 10.101.1.175
      access_ip: 10.101.1.175
    sdbfi-k8s-worker2:
      ansible_host: 10.101.1.176
      ip: 10.101.1.176
      access_ip: 10.101.1.176
    sdbfi-k8s-worker3:
      ansible_host: 10.101.1.177
      ip: 10.101.1.177
      access_ip: 10.101.1.177
  children:
    kube-master:
      hosts:
        sdbfi-k8s-master1:
        sdbfi-k8s-master2:
        sdbfi-k8s-master3:
    kube-node:
      hosts:
        sdbfi-k8s-master1:
        sdbfi-k8s-master2:
        sdbfi-k8s-master3:
        sdbfi-k8s-worker1:
        sdbfi-k8s-worker2:
        sdbfi-k8s-worker3:
    etcd:
      hosts:
        sdbfi-k8s-master1:
        sdbfi-k8s-master2:
        sdbfi-k8s-master3:
    k8s-cluster:
      children:
        kube-master:
        kube-node:
    calico-rr:
      hosts: {}

Command used to invoke ansible:

ansible-playbook cluster.yml -v -b -i inventory/prod/hosts.yaml --become --become-user=root --tags=apps

Output of ansible run:
https://gist.github.com/irizzant/bd9b6d896052a99e1b9bdb530bbc8416
Anything else do we need to know:

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2020
@irizzant
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2020
@irizzant
Copy link
Contributor Author

Any news on this?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2020
@irizzant
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 26, 2020
@floryut
Copy link
Member

floryut commented Jun 15, 2020

@irizzant sorry about the delay, if I understand the doc and your issue the only missing thing is

        - hostPath:
            path: /dev
            type: Directory

Would you be able to submit a PR ? if you have a local env with the test case it's easier to validate for you; thank you

@irizzant
Copy link
Contributor Author

See above the required PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants