Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial integration tests for cgroupv2 #2295

Merged
merged 8 commits into from
Apr 14, 2020

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 4, 2020

This refactors integration test helpers a bit, and adds some initial tests for cgroupv2.

Split into logical small(er) commits for the sake of easier reviewing, so better to review it commit by commit.

This is far from complete, but is it mergeable and already helped to find a surprising number of bugs:

@AkihiroSuda
Copy link
Member

Thanks a lot for working on this, do you plan to work on runc events as well?

@AkihiroSuda
Copy link
Member

Can we execute this on Travis?

@kolyshkin
Copy link
Contributor Author

Can we execute this on Travis?

Probably not, Travis only provides Ubuntu 18.04 and I think it's kernel is too old.

We need to find some other CI platform, say one that provides Fedora 31. Maybe Cirrus-CI could be used, I have yet to research it.

@AkihiroSuda
Copy link
Member

We have been already running Fedora 31 KVM on Travis for unittest

- sudo ssh default sudo podman run --privileged --cgroupns=private -v /lib/modules:/lib/modules:ro test make localunittest

@kolyshkin
Copy link
Contributor Author

We have been already running Fedora 31 KVM on Travis for unittest

Right! I forgot about vargant/kvm. Will try to enable localintegration then.

@kolyshkin kolyshkin force-pushed the integration-cgroups branch from 0cc5be6 to 71cfe8d Compare April 4, 2020 06:13
@kolyshkin
Copy link
Contributor Author

Not sure if it's possible to run a podman container with systemd in it.

@kolyshkin kolyshkin force-pushed the integration-cgroups branch from 71cfe8d to a7cf01f Compare April 4, 2020 06:31
@AkihiroSuda
Copy link
Member

No need to use Podman/Docker for systemd test

- sudo PATH="$PATH" make localintegration RUNC_USE_SYSTEMD=1

@kolyshkin
Copy link
Contributor Author

Description updated to refer to the up-to-date relevant PRs and issues.

do you plan to work on runc events as well?

@AkihiroSuda not at the moment, at least not until we fix cgroup v2.

@kolyshkin kolyshkin force-pushed the integration-cgroups branch from a7cf01f to 76db7b5 Compare April 7, 2020 23:28
@AkihiroSuda
Copy link
Member

I asked it because we have runc events tests 😄, but can be skipped at this moment. (I'm even not sure whether there is real consumer of runc events... None of Moby/Podman/containerd/CRI-O uses runc events.)

https://github.com/opencontainers/runc/blob/master/tests/integration/events.bats

@kolyshkin kolyshkin force-pushed the integration-cgroups branch from 76db7b5 to 2e6e793 Compare April 9, 2020 14:30
@kolyshkin
Copy link
Contributor Author

Removed the patch that caused test breakage

@kolyshkin kolyshkin force-pushed the integration-cgroups branch 4 times, most recently from 118df02 to 825b300 Compare April 9, 2020 21:22
@kolyshkin
Copy link
Contributor Author

OK the tests are running (currently with systemd only) and passing!

Let me see if I can move bats installation into a script.

@kolyshkin kolyshkin force-pushed the integration-cgroups branch 3 times, most recently from 2c2fb3b to 3dac3f6 Compare April 9, 2020 23:04
.travis.yml Outdated
@@ -34,6 +34,7 @@ matrix:
- sudo ssh default sudo podman build -t test /vagrant
# Mounting /lib/modules into the container is necessary as CRIU wants to load (via iptables) additional modules
- sudo ssh default sudo podman run --privileged --cgroupns=private -v /lib/modules:/lib/modules:ro test make localunittest
- sudo ssh default 'cd /vagrant && sudo ./script/install-bats.sh && make RUNC_USE_SYSTEMD=yes integration'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • bats should be installed in Vagrantfile
  • integration -> localintegration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bats should be installed in Vagrantfile

I did that initially, but by copy-pasting the installation steps from the Dockerfile. I'd rather have it in a script executed from both places. Is there a way to source the script to Vagrantfile (like I do for Dockerfile)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, it should be also possible to invoke scripts under /vagrant from the inline script, but haven't confirmed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like I found a way. Let me test locally and then I'll push it.

@kolyshkin kolyshkin force-pushed the integration-cgroups branch 2 times, most recently from c3f85b7 to 3805bfa Compare April 10, 2020 03:55
@kolyshkin
Copy link
Contributor Author

OK we need runc binary and bats on the vagrant host. We build runc inside a debian container, it might now work on a fedora host.

Alternatively, we build it on host with whatever golang version Fedora 31 provides (currently 1.13.9 which is good).

@kolyshkin
Copy link
Contributor Author

So, version 1 is to copy bats and runc from the container to the host and run the test (we can't use make since it wants to rebuild runc):

diff --git a/.travis.yml b/.travis.yml
index 3e12b820..44773938 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -34,6 +34,9 @@ matrix:
         - sudo ssh default sudo podman build -t test /vagrant
         # Mounting /lib/modules into the container is necessary as CRIU wants to load (via iptables) additional modules
         - sudo ssh default sudo podman run --privileged --cgroupns=private -v /lib/modules:/lib/modules:ro test make localunittest
+        # Copy bats and compiled runc binary from test image to vagrant host
+        - sudo ssh default 'sudo podman run --rm test tar cf - /usr/local/bin/bats /usr/local/libexec/bats* | sudo tar xf - -C / && sudo podman run --rm test tar cf - runc | sudo tar xf - -C /vagrant'
+        - sudo ssh default 'cd /vagrant && sudo RUNC_USE_SYSTEMD=yes bats -t tests/integration'
   allow_failures:
     - go: tip

version 2 (install git, bats, golang, make on the host, compile and run test as usual):

diff --git a/.travis.yml b/.travis.yml
index 44773938..aaca6307 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -34,6 +34,7 @@ matrix:
         - sudo ssh default sudo podman build -t test /vagrant
         # Mounting /lib/modules into the container is necessary as CRIU wants to load (via iptables) additional modules
         - sudo ssh default sudo podman run --privileged --cgroupns=private -v /lib/modules:/lib/modules:ro test make localunittest
+        - sudo ssh default 'cd /vagrant && sudo RUNC_USE_SYSTEMD=yes make localintegration'
   allow_failures:
     - go: tip
 
diff --git a/Dockerfile b/Dockerfile
index 94be151e..e7d203a7 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,5 +1,4 @@
 ARG GO_VERSION=1.13
-ARG BATS_VERSION=03608115df2071fff4eaaff1605768c275e5f81f
 ARG CRIU_VERSION=v3.13
 
 FROM golang:${GO_VERSION}-buster
@@ -49,13 +48,8 @@ RUN dpkg --add-architecture armel \
 RUN useradd -u1000 -m -d/home/rootless -s/bin/bash rootless
 
 # install bats
-ARG BATS_VERSION
-RUN cd /tmp \
-    && git clone https://github.com/sstephenson/bats.git \
-    && cd bats \
-    && git reset --hard "${BATS_VERSION}" \
-    && ./install.sh /usr/local \
-    && rm -rf /tmp/bats
+COPY script/install-bats.sh /tmp
+RUN /tmp/install-bats.sh && rm /tmp/install-bats.sh
 
 # install criu
 ARG CRIU_VERSION
diff --git a/Vagrantfile b/Vagrantfile
index 165b0780..c8ec0ad0 100644
--- a/Vagrantfile
+++ b/Vagrantfile
@@ -13,6 +13,7 @@ Vagrant.configure("2") do |config|
     v.cpus = 2
   end
   config.vm.provision "shell", inline: <<-SHELL
-    dnf install -y podman
+    dnf install -y podman git make golang-go
   SHELL
+  config.vm.provision "shell", path: "script/install-bats.sh"
diff --git a/script/install-bats.sh b/script/install-bats.sh
new file mode 100644
index 00000000..d63f4f0a
--- /dev/null
+++ b/script/install-bats.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+set -u -e -o pipefail
+: "${BATS_VERSION:="03608115df2071fff4eaaff1605768c275e5f81f"}"
+
+cd /tmp \
+    && git clone https://github.com/sstephenson/bats.git \
+    && cd bats \
+    && git reset --hard "${BATS_VERSION}" \
+    && ./install.sh /usr/local \
+    && rm -rf /tmp/bats

@AkihiroSuda which one do you like better? (I kinda don't like either)

@kolyshkin kolyshkin force-pushed the integration-cgroups branch 2 times, most recently from 0b6ce29 to efc480e Compare April 12, 2020 09:06
@kolyshkin
Copy link
Contributor Author

OK, due to some options set by vagrant ssh-config, command ssh default do not create a pty on the remote side, so stdin is a pipe and runc exec with "terminal": true set in config.json tries to treat stdin (fd 0) as a terminal, and it fails.

Fix: add -t to ssh options to enforce a pty.

@kolyshkin
Copy link
Contributor Author

My tests are now passing. Got unrelated failures in podman:

STEP 12: COPY tests/integration/multi-arch.bash tests/integration/
--> b6acd7d5fd6
STEP 13: ENV ROOTFS /busybox
--> c11d4c496d4
STEP 14: RUN mkdir -p "${ROOTFS}"
error opening file `/sys/fs/cgroup//system.slice/buildah-buildah794169410-22047.scope/cgroup.freeze`: No such file or directory
kill container: No such process
error running container: error reading container state: exit status 1
Error: error building at STEP "RUN mkdir -p "${ROOTFS}"": error while running runtime: exit status 1
travis_time:end:0874bc6c:start=1586682807783553791,finish=1586683034446188144,duration=226662634353,event=script
�[0K�[31;1mThe command "sudo ssh default sudo podman build -t test /vagrant" exited with 125.�[0m
travis_time:start:0a243466
�[0K$ sudo ssh default sudo podman run --privileged --cgroupns=private -v /lib/modules:/lib/modules:ro test make localunittest
Trying to pull docker.io/library/test...
  denied: requested access to the resource is denied
Trying to pull registry.fedoraproject.org/test...
  manifest unknown: manifest unknown
Trying to pull registry.access.redhat.com/test...
  name unknown: Repo not found
Trying to pull registry.centos.org/test...
  manifest unknown: manifest unknown
Trying to pull quay.io/test...
  error parsing HTTP 404 response body: invalid character '<' looking for beginning of value: "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\n<title>404 Not Found</title>\n<h1>Not Found</h1>\n<p>The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.</p>\n"
Error: unable to pull test: 5 errors occurred:
	* Error initializing source docker://test:latest: Error reading manifest latest in docker.io/library/test: errors:
denied: requested access to the resource is denied
unauthorized: authentication required

	* Error initializing source docker://registry.fedoraproject.org/test:latest: Error reading manifest latest in registry.fedoraproject.org/test: manifest unknown: manifest unknown
	* Error initializing source docker://registry.access.redhat.com/test:latest: Error reading manifest latest in registry.access.redhat.com/test: name unknown: Repo not found
	* Error initializing source docker://registry.centos.org/test:latest: Error reading manifest latest in registry.centos.org/test: manifest unknown: manifest unknown
	* Error initializing source docker://quay.io/test:latest: Error reading manifest latest in quay.io/test: error parsing HTTP 404 response body: invalid character '<' looking for beginning of value: "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\n<title>404 Not Found</title>\n<h1>Not Found</h1>\n<p>The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.</p>\n"


travis_time:end:0a243466:start=1586683034452475855,finish=1586683036696948228,duration=2244472373,event=script
�[0K�[31;1mThe command "sudo ssh default sudo podman run --privileged --cgroupns=private -v /lib/modules:/lib/modules:ro test make localunittest" exited with 125.�[0m

@kolyshkin kolyshkin force-pushed the integration-cgroups branch from efc480e to 1b11fce Compare April 12, 2020 19:00
@kolyshkin
Copy link
Contributor Author

OK, looks like everything works except runc events (which is not modified for cgroupv2 memory.events) and, for some strange reason, runc ps (which works unless --systemd-cgroup is set).

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Apr 12, 2020

Travis didn't notify the result, but it is passing 👍

LGTM, thanks!

Approved with PullApprove

@kolyshkin
Copy link
Contributor Author

kolyshkin commented Apr 13, 2020

for some strange reason, runc ps (which works unless --systemd-cgroup is set)

This is pretty shocking: cgroup v2 driver do not place pids in the cgroup (#2310), and the only test to show it is runc ps.

This seems to be easy to fix but I want to do it on top of #2299 fix is trivial, see #2311

@kolyshkin
Copy link
Contributor Author

@mrunalp @crosbymichael PTAL

Consolidate two implementations of check_cgroup_value()
into one, putting it into helpers.

Remove the first parameter, deducing the variable to get
the path from by the parameter name.

This should help in future cgroupv2 support.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Consolidate all the cgroup-related initialization code to
   a single place, init_cgroup_paths(), so we can see which
   variables are set.

2. Lazily call init_cgroup_paths() from all places that require it.

3. Don't set globals KMEM and RT_PERIOD.

4. Slightly clarlify variables naming:
    - use OCI_CGROUPS_PATH for cgroupsPath in config.json
    - use REL_CGROUPS_PATH for relative cgroups path

5. Do not hardcode the list of cgroup subsystems -- get it from
   /proc/cgroup.

6. Preliminary support for cgroupv2 unified hierarchy (not yet working).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This comment was added by commit 6cd425b (Allow update rt_period_us
and rt_runtime_us, Nov 4 2016), and the test case was added by commit
51baedf (Add integration for update rt period and runtime,
Nov 28 2016), making the comment obsolete.

Remove it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
... and add kmem-tcp to cgroups kmem test.

First, we already have a separate kmem test in cgroups.bats.

Second, making kmem a requirement leads to skipping all the other
test cases in the update.bats test.

Third, kmem limit is being removed from the kernel, so it makes sense
to handle it separately.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
There's no need for an intermediate variable

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Add `cgroups_v1` and `cgroups_v2` options to `requires`.

2. Modify `check_cgroup_value` to be able to work with v2.

3. Split `test "update"` into two:

   - (1) testing cgroupv1-only cpu shares and cfs
   - (2) testing limits that are more or less common
         between v1 and v2: memory/swap, pids, cpusets.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Swap setting for cgroupv2 is currently broken, so let's temporarily
disable this part of test.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin force-pushed the integration-cgroups branch from 1b11fce to bcbdccc Compare April 13, 2020 23:34
@kolyshkin
Copy link
Contributor Author

  • fixed ps failures by adding dnf update container-selinux to Vagrantfile :)
  • rebased

@kolyshkin
Copy link
Contributor Author

OK, vagrant on Travis is installing container-selinux-2.117.0-1.gitbfde70a.fc31 while on my vagrant box I have container-selinux-2.124.0-3.fc31.noarch. Alas, we need to run dnf update.

@kolyshkin kolyshkin force-pushed the integration-cgroups branch 2 times, most recently from 50d4157 to 571a36c Compare April 14, 2020 01:40
Those needs to be run on the (Vagrant Fedora 31) host
(since we need real systemd running), and so we have
to have all the tools needed to compile runc and run
the tests.

The good news is Fedora packages a decent and recent release
of bats-core (1.1.0), which we can use (Debian does not),
and we can also use golang (currently 1.13.9) from Fedora.

The bad news are

 1. Currently cgroups tests are only working with
    RUNC_USE_SYSTEMD=yes (addressed by opencontainers#2299, opencontainers#2305)

 2. Tests in events.bats do not work (need cgroupv2
    memory.events support)

 3. Fedora 31 image is 6 months old (and has broken
    container-selinux policy) so we need `dnf update`,
    which adds ~5 min to test time.

[v2: add -t to ssh to enforce pty]
[v3: disable events tests for cgroupv2]
[v4: update fedora packages, use a single dnf transation]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin force-pushed the integration-cgroups branch from 571a36c to 84583eb Compare April 14, 2020 01:40
@AkihiroSuda
Copy link
Member

AkihiroSuda commented Apr 14, 2020

LGTM

Approved with PullApprove

1 similar comment
@mrunalp
Copy link
Contributor

mrunalp commented Apr 14, 2020

LGTM

Approved with PullApprove

@mrunalp mrunalp merged commit 56aca5a into opencontainers:master Apr 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants