Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Run enclave-cc on Kairos #1114

Closed
3 tasks
Tracked by #2131 ...
mudler opened this issue Mar 13, 2023 · 13 comments · Fixed by #1243
Closed
3 tasks
Tracked by #2131 ...

🌱 Run enclave-cc on Kairos #1114

mudler opened this issue Mar 13, 2023 · 13 comments · Fixed by #1243
Assignees
Labels
enhancement New feature or request lane/coco

Comments

@mudler
Copy link
Member

mudler commented Mar 13, 2023

https://github.com/confidential-containers/enclave-cc

Acceptance criteria

  • We have a documentation page showing how to run confidential workloads on Kairos
  • We have a bundle that sets up enclave-cc and the operator accordingly
  • Have an e2e example to show confidential workloads running on Kairos

Useful docs links:

@mudler mudler mentioned this issue Mar 13, 2023
24 tasks
@mudler mudler changed the title Run enclave-cc on Kairos 🌱 Run enclave-cc on Kairos Mar 13, 2023
@mudler mudler added enhancement New feature or request lane/coco labels Mar 13, 2023
@mudler mudler moved this to Todo 🖊 in 🧙Issue tracking board Mar 13, 2023
@jimmykarily jimmykarily moved this from Todo 🖊 to In Progress 🏃 in 🧙Issue tracking board Mar 14, 2023
@jimmykarily jimmykarily self-assigned this Mar 14, 2023
@jimmykarily
Copy link
Contributor

The installation instructions fail on this step: https://github.com/confidential-containers/operator/blob/main/docs/INSTALL.md#create-custom-resource-cr

I'm using a k3d cluster and when the cc-operator-pre-install-daemon-5b8qf pod start it fails with an error:

Copying containerd-for-cc artifacts onto host                                                                                                                                              Failed to get D-Bus connection: Operation not permitted   

I think it has to do with this script trying to run systemctl: https://github.com/confidential-containers/operator/blob/a0fbbf40ad0848aee6c9ed90fbf7d001e50396c4/install/pre-install-payload/scripts/container-engine-for-cc-deploy.sh#L55

the failing container is running with:

 50     securityContext:                                                            
 51       privileged: true                                                          
 52       runAsUser: 0  

so I don't see how this could be a permissions issue.

@jimmykarily
Copy link
Contributor

The Pod assumes dbus is running on the host: https://github.com/confidential-containers/operator/blob/a0fbbf40ad0848aee6c9ed90fbf7d001e50396c4/config/samples/enclave-cc/base/ccruntime-enclave-cc.yaml#L43

which is not the case for the k3d container.

@mudler
Copy link
Member Author

mudler commented Mar 14, 2023

I'd suggest to try on a real cluster with kairos and k3s as assumes you have services running, you might hit several limitations along the way with k3d that might block you differently

@mudler
Copy link
Member Author

mudler commented Mar 14, 2023

@jimmykarily
Copy link
Contributor

I switched to a kairos cluster. This is the next error:

│ Copying containerd-for-cc artifacts onto host                                                                                                                                              │
│ Restarting containerd                                                                                                                                                                      │
│ Failed to restart containerd.service: Unit containerd.service not found. 

containerd based cluster is in the requirements

@mudler
Copy link
Member Author

mudler commented Mar 14, 2023

I switched to a kairos cluster. This is the next error:

│ Copying containerd-for-cc artifacts onto host                                                                                                                                              │
│ Restarting containerd                                                                                                                                                                      │
│ Failed to restart containerd.service: Unit containerd.service not found. 

containerd based cluster is in the requirements

K3s is containerd based: https://docs.k3s.io/advanced#using-docker-as-the-container-runtime

@mudler
Copy link
Member Author

mudler commented Mar 14, 2023

I think the issue is it tries to restart a service and gives for guaranteed there is one. maybe we can slightly adapt their script to work on k3s?

@jimmykarily
Copy link
Contributor

jimmykarily commented Mar 15, 2023

After inspecting how the operator installs the needed binaries, we came up with this plan (cc @mudler ):

@mudler
Copy link
Member Author

mudler commented Mar 15, 2023

Some notes for manual steps so far:

ln -s /etc/systemd/system/k3s.service /etc/systemd/system/containerd.service

# To note, take k8s 1.26.x as containerd version have to be compatible with the version
# provided by intel
# Install the operator as in docs
systemctl stop k3s
cp /opt/confidential-containers/bin/containerd /var/lib/rancher/k3s/data/current/bin/containerd
### After the operator runs, it generate a config.toml for kata
# The original template is here: https://github.com/k3s-io/k3s/blob/master/pkg/agent/templates/templates_linux.go#L10
cat /etc/containerd/config.toml >> /var/lib/rancher/k3s/agent/etc/containerd/config.toml

@jimmykarily
Copy link
Contributor

jimmykarily commented Mar 17, 2023

There is one part of the k3s containerd config that we copy which makes the custom containerd break:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

(if I remove this from /etc/containerd/config.toml Pods come up just fine).
When that's in place, Pods refuse to come up with errors like the one described here: containerd/containerd#4857

Some usefule links:

What I don't understand is how can it be that k3s kubelet flags don't match the config k3s generated (which is the one we copied). Maybe when we set K3S_CONFIG_FILE this overrides all other config? Maybe k3s was setting kubelet cgroup driver to systemd (that's why it's in the generated config for containerd) and by setting a config file, we removed that option? If that's true, I wonder what other options we may have removed from k3s.

@jimmykarily
Copy link
Contributor

@mauromorales let's remember that forcing the systemd cgroup driver might not work on alpine: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#systemd-cgroup-driver

When [systemd](https://www.freedesktop.org/wiki/Software/systemd/) is chosen as the init system for a Linux distribution, the init process generates and consumes a root control group (cgroup) and acts as a cgroup manager.

@mauromorales mauromorales removed their assignment Mar 22, 2023
@jimmykarily
Copy link
Contributor

jimmykarily commented Mar 22, 2023

I summarized all the steps to reproduce what we achieved so far:

Steps to deploy coco on kairos

  • Deploy a kairos cluster from latest master (because we need the immucore fix to mount /etc)
    A config like this should be used (see the bundles section):
#cloud-config
bundles:
    - targets:
        - run://quay.io/kairos/community-bundles:system-upgrade-controller_latest
        - run://quay.io/kairos/community-bundles:cert-manager_latest
        - run://quay.io/kairos/community-bundles:kairos_latest
        - run://ttl.sh/kairos-testing/enclave-cc:8h

install:
    auto: true
    device: auto
    reboot: true

k3s:
    enabled: true

users:
    - name: kairos
      passwd: kairos

The enclave-cc bundle is built from this directory: https://github.com/kairos-io/community-bundles/tree/1114-enclave-cc/coco
running this command:

  docker build -t ttl.sh/kairos-testing/enclave-cc:8h . && docker push ttl.sh/kairos-testing/enclave-cc:8h

(We need to package it an ship it)

  • A reboot is needed in order for k3s and containerd to be restarted with the new settings (or manually killing containers with k3s-killall.sh and systemctl restart containerd k3s might do)

  • Label our node:

  kubectl label --overwrite node $(kubectl get nodes -o jsonpath='{.items[].metadata.name}') node-role.kubernetes.io/worker=""
  kubectl apply -k github.com/confidential-containers/operator/config/release?ref=v0.4.0
  • [Deploy the ccruntime resource]
  kubectl apply -k github.com/confidential-containers/operator/config/samples/ccruntime/ssh-demo?ref=v0.4.0

(wait until they are all running: kubectl get pods -n confidential-containers-system --watch)

  • [Temporarily] From the kairos node, add the kata sections at the bottom of
    /etc/containerd/config.toml to /opt/containerd/config.toml

    When we bump immucore and move back to /etc from /opt (See relevant issue) we may not need this step at all. The coco operator will replace our config.toml and this may or may not work because the one we used was generated by k3s and the values in there may be needed.
    UPDATE: After going back to using /etc, the operator simply append the kata plugin settings in the existing config.toml. All good.

  • Deploy a workload

    The last part with the verification will only work from within a Pod because the IP address is internal:

    ssh -i ccv0-ssh root@$(kubectl get service ccv0-ssh -o jsonpath="{.spec.clusterIP}")

    You can create a Pod like this:

    apiVersion: v1
    kind: Pod
    metadata:
      name: kubectl
    spec:
      containers:
      - name: kubectl
        image: opensuse/leap
        command: ["/bin/sh", "-ec", "trap : TERM INT; sleep infinity & wait"]
    

    get a shell to it and run the verification commands

    (you will need to install ssh and find out the IP address of the service outside the Pod)

@jimmykarily
Copy link
Contributor

Will wait for feedback from the enclave-cc team and if all is ok, will document this and close the issue.

@mudler mudler mentioned this issue Mar 30, 2023
29 tasks
jimmykarily added a commit that referenced this issue Apr 3, 2023
Fixes #1114

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
@jimmykarily jimmykarily moved this from In Progress 🏃 to Under review 🔍 in 🧙Issue tracking board Apr 3, 2023
mudler pushed a commit that referenced this issue Apr 5, 2023
* Add instructions on how to use the `coco` bundle

Fixes #1114

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>

* Address PR comments

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>

---------

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
@github-project-automation github-project-automation bot moved this from Under review 🔍 to Done ✅ in 🧙Issue tracking board Apr 5, 2023
@mudler mudler mentioned this issue May 26, 2023
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lane/coco
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants