Skip to content

pfnet-research/meta-fuse-csi-plugin

Repository files navigation

meta-fuse-csi-plugin: A CSI Driver for All FUSE Implementations

E2E test with kind

CSI Plugin to run and mount any FUSE implementations (e.g. mountpoint-s3) in Kubernetes pods without privileged:true.

Mounting FUSE implementations requires CAP_SYS_ADMIN. However, assigning CAP_SYS_ADMIN to normal pods is not recommended in terms of security. meta-fuse-csi-plugin enables pods to run and mount FUSE implementations without CAP_SYS_ADMIN. This brings us better security and more usability with object storages.

For more details, please refer to our blog (English, Japanese)

Current support status

Currently, meta-fuse-csi-plugin supports below FUSE implementations. examples contains examples for mountpoint-s3, goofys,s3fs, ros3fs, gcsfuse and sshfs. Excepting gcsfuse, you can run examples in local kind cluster.

Running an example in local kind cluster

You can try this plugin with local kind cluster

Dependencies

Create cluster and build images

build-for-kind.sh builds plugin and example images and load them to the kind cluster.

$ kind create cluster
$ ./build-for-kind.sh

Deploy plugin

deploy/csi-driver.yaml and deploy/csi-driver-daemonset.yaml are manifests for plugin.

$ kubectl apply -f ./deploy/csi-driver.yaml
namespace/mfcp-system created
csidriver.storage.k8s.io/meta-fuse-csi-plugin.csi.storage.pfn.io created
$ kubectl apply -f ./deploy/csi-driver-daemonset.yaml
daemonset.apps/meta-fuse-csi-plugin created

Please confirm the plugin is successfully deployed.

$ kubectl get ds -n mfcp-system
NAME                   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
meta-fuse-csi-plugin   1         1         1       1            1           kubernetes.io/os=linux   28m

Deploy mountpoint-s3 example

examples/proxy/mountpoint-s3/deploy.yaml provides a pod with mountpoint-s3 and MinIO. Bucket test-bucket is mounted at /data in busybox container.

As for other examples, examples/proxy/goofys/deploy.yaml (for goofys), examples/proxy/s3fs/deploy.yaml (for s3fs) and examples/proxy/sshfs/deploy.yaml (for sshfs) exist.

$ kubectl apply -f ./examples/proxy/mountpoint-s3/deploy.yaml
$ kubectl get pods mfcp-example-proxy-mountpoint-s3
NAME                               READY   STATUS    RESTARTS   AGE
mfcp-example-proxy-mountpoint-s3   3/3     Running   0          14s
$ kubectl exec -it mfcp-example-proxy-mountpoint-s3 -c busybox -- /bin/ash
/ # 
/ # cd /data
/data # ls -l
total 1
-rw-r--r--    1 root     root            30 Oct 27 02:45 test.txt
/data # cat test.txt
This is a test file for minio

starter container contains /usr/bin/mc to operate MinIO's bucket. You can upload file to the bucket and read it via mountpoint-s3.

$ kubectl exec -it mfcp-example-proxy-mountpoint-s3 -c starter -- /bin/bash
root@mfcp-example-proxy-mountpoint-s3:/# echo "Hello, World!" > hello.txt
root@mfcp-example-proxy-mountpoint-s3:/# mc cp hello.txt k8s-minio-dev/test-bucket
/hello.txt:                         14 B / 14 B ━━━━━━━━━━━━━ 1.15 KiB/s 0s
root@mfcp-example-proxy-mountpoint-s3:/# exit
$ kubectl exec -it mfcp-example-proxy-mountpoint-s3 -c busybox -- cat /data/hello.txt
Hello, World!

After the trial, delete the pod.

$ kubectl delete -f ./examples/proxy/mountpoint-s3/deploy.yaml
pod "mfcp-example-proxy-mountpoint-s3" deleted

NOTICE: FUSE container should be a sidecar (a.k.a. restartable init container)

meta-fuse-csi-plugin mounts FUSE implementations after the container started. Thus, the application container should start after the FUSE filesystem is surely mounted.

To achieve this, FUSE container should be a sidecar container, a.k.a restartable init container (enabled by default since Kubernetes v1.30). Please don't forget defining startup probe to make sure fuse volume is actually mounted before app containers are started. See examples/proxy/mountpoint-s3/deploy.yaml for how to.

If you can't use sidecar feature in your cluster, there can be a workaround. please wait for the FUSE impl is mounted in the application container like below:

  - image: busybox
    name: busybox
    command: ["/bin/ash"]
    args: ["-c", "while [[ ! \"$(/bin/mount | grep fuse)\" ]]; do echo \"waiting for mount\" && sleep 1; done; sleep infinity"]

or

function wait_for_fuse_mounted() {
    while [[ ! $(kubectl exec $1 -c $2 -- /bin/mount | grep fuse) ]]; do echo "waiting for mount" && sleep 1; done
}

Please remember that subPath doesn't work in this method because when FUSE container is a normal container (i.e. not a sidecar), subPath volume mount creation by kubelet can race with actual FUSE impl process startup. This race might cause that mounted subPath volume could be empty.

Running E2E tests

Tested Environment

  • Ubuntu 23.04 (Kernel 6.2.0-35-generic)
  • Docker (version 24.0.7)
  • kubectl (v1.28.2)
  • kind (v0.20.0)
  • Kubernetes (v1.27.3, running with kind)

You can run E2E tests with kind.

$ make test-e2e

How it works?

meta-fuse-csi-plugin has two pods, one is CSI driver Pod with CAP_SYS_ADMIN and the other is User Pod. CSI driver Pods are deployed by cluster operators on each node as DaemonSet. They process privileged operations (open("/dev/fuse") and mount("fuse", ...)) on behalf of FUSE implementations. User Pods are deployed by users. Users can use any FUSE implementations and deploy them without CAP_SYS_ADMIN as they like.

meta-fuse-csi-plugin provides two approaches fuse-starter and fusermount3-proxy to run and mount FUSE implementations.

fuse-starter: Direct fd passing approach

This approach derives from gcs-fuse-csi-driver. Some FUSE implementations support to receive file descriptor (fd) for "/dev/fuse" as an argument. They use the received fd to communicate FUSE operations with Linux kernel. As for libfuse3, a FUSE user library, when "/dev/fd/X" is specified as the mount point, libfuse3 will interpret X as the file descriptor for "/dev/fuse", and perform FUSE operations. Similarly, jacobsa/fuse, a FUSE library used by gcsfuse, provides equivalent functionality.

fuse-starter communicates with CSI driver Pod via Unix Domain Socket (UDS), and CSI driver Pod performs open("/dev/fuse", ...) and mount("fuse") with acquired fd. Then, fuse-starter receives the fd from CSI driver Pod and passes the fd to the FUSE implementation when fuse-starter executes it.

fusermount3-proxy: Modified fusermount3 approach

fusermount3-proxy exploits libfuse3's fusermount3 mount approach.

fusermount3 is a executable binary with setuid. It performs privileged operations (open and mount) on behalf of libfuse3. When libfuse3 tries to mount FUSE implementations but failed mount(2) due to lack of permissions, libfuse3 executes fusermount3 and fusermount3 open("/dev/fuse", ...) and mount("fuse"). Then, fusermount3 passes fd for "/dev/fuse" to libfuse3, and libfuse3 continues to process FUSE operations.

fusermount3-proxy behaves as fusermount3 and it passthrough mount operations to CSI driver Pod.

For more details, please refer to our blog (English, Japanese)

Acknowledgement

The driver implementation is forked from gcs-fuse-csi-driver. gcs-fuse-csi-driver is licensed under Apache 2.0 as described below.

LICENSE

# Copyright 2018 The Kubernetes Authors.
# Copyright 2022 Google LLC
# Copyright 2023 Preferred Networks, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.