Skip to content
This repository has been archived by the owner on Oct 22, 2024. It is now read-only.

support PMEM inside Kata Containers #303

Closed
pohly opened this issue Jun 6, 2019 · 9 comments
Closed

support PMEM inside Kata Containers #303

pohly opened this issue Jun 6, 2019 · 9 comments
Assignees
Labels
0.7 needs to be fixed in 0.7.x

Comments

@pohly
Copy link
Contributor

pohly commented Jun 6, 2019

Applications running inside Kata Containers cannot use PMEM in App Direct mode because they don't get access to the original filesystem.

One idea for addressing this is to:

  • map the entire partition into memory
  • start QEMU such that it makes that memory range available inside the virtual machine
  • mount that memory range inside the virtual machine

Details to be decided, and mostly has to be handled in Kata Containers...

@pohly
Copy link
Contributor Author

pohly commented Jun 6, 2019

To reproduce the problem inside our QEMU virtual cluster, nested virtualization is needed. We also need changes to install and use Kata Containers. I have all of that in a branch:
https://github.com/pohly/pmem-CSI/commits/nested-virtualization

@okartau
Copy link
Contributor

okartau commented Jul 24, 2019

@kiendinh
Copy link

Kernel 5.3.1 has been equipped with ClearLinux version 31090, and I've seen that VIRTIO_PMEM is there. Can we expect to see PMEM-CSI on Kata soon?

@pohly
Copy link
Contributor Author

pohly commented Oct 11, 2019

My next step will be to try out Kata with virtiofs support, which will be released shortly. This might support volume-passthrough with fsdax.

@pohly
Copy link
Contributor Author

pohly commented Nov 25, 2019

kata-containers >= 1.9.0 has support for virtiofs builtin (https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-virtio-fs-with-kata.md) when using the kata-qemu-virtiofs runtime class (https://raw.githubusercontent.com/kata-containers/packaging/master/kata-deploy/k8s-1.14/kata-qemu-virtiofs-runtimeClass.yaml):

diff --git a/deploy/common/pmem-app-ephemeral.yaml b/deploy/common/pmem-app-ephemeral.yaml
index aca6bda6..28bf9220 100644
--- a/deploy/common/pmem-app-ephemeral.yaml
+++ b/deploy/common/pmem-app-ephemeral.yaml
@@ -5,6 +5,7 @@ apiVersion: v1
 metadata:
   name: my-csi-app-inline-volume
 spec:
+  runtimeClassName: kata-qemu-virtiofs
   containers:
     - name: my-frontend
       image: busybox

However, virtio-fs turned out to be not suitable for PMEM:

  • It does not map all pages at once. Instead, it maintains a cache of mapped pages which is considerably smaller ("a few GB") than the available PMEM. This should lead to lower performance.
  • Because a page might not be currently mapped when written to, it does not meet MAP_SYNC requirements.

I have engaged with the Kata Container folks here: kata-containers/runtime#2262

@pohly pohly self-assigned this Dec 10, 2019
@pohly
Copy link
Contributor Author

pohly commented Jan 14, 2020

Functional PoC in #500, now we need the corresponding changes in Kata Containers.

@pohly
Copy link
Contributor Author

pohly commented May 8, 2020

Kata Containers will have support in 1.11.0 (currently available as -rc0). PR #500 contains an E2E tst with Kata Containers, but it's still WIP and doesn't pass.

One problem is that by default, Kata Containers only allows VMs to have as much memory as the host has DRAM. If the host than wants to add a much larger PMEM volume, Kata Containers fails with something like:

Error: container create failed: QMP command failed: not enough space, currently 0x8000000 in use of total space for memory devices 0x3c100000

One solution is to edit /opt/kata/share/defaults/kata-containers/configuration-qemu.toml after kata-deploy created it and increase memory_offset: https://github.com/kata-containers/runtime/blob/master/cli/config/configuration-qemu.toml.in#L91

Alternatively, that limit can be raised individually for each pod:
https://github.com/kata-containers/documentation/blob/master/how-to/how-to-set-sandbox-config-kata.md

@pohly
Copy link
Contributor Author

pohly commented May 18, 2020

Should work now in "devel" (PR #500), but not tested in CI. Need to test once manually, then close this issue.

@pohly pohly added the 0.7 needs to be fixed in 0.7.x label May 18, 2020
@pohly
Copy link
Contributor Author

pohly commented Jun 10, 2020

Manual testing found a regression which then was fixed. Works now.

@pohly pohly closed this as completed Jun 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
0.7 needs to be fixed in 0.7.x
Projects
None yet
Development

No branches or pull requests

3 participants