Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add e2e tests for docker #1845

Merged
merged 8 commits into from
Jul 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/cloud-api-adaptor/docker/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
image/resources/*
112 changes: 102 additions & 10 deletions src/cloud-api-adaptor/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ The `docker` provider simulates a pod VM inside a docker container.
Docker engine version 26+ supports API 1.44.

Ensure you complete the [post install steps](https://docs.docker.com/engine/install/linux-postinstall/) if using non-root user


- Kubernetes cluster

## Build CAA pod-VM image
Expand All @@ -26,13 +26,13 @@ export CLOUD_PROVIDER=docker
```

- Build the required pod VM binaries

```bash
cd src/cloud-api-adaptor/docker/image
make
```

This will build the required binaries inside a container and place
This will build the required binaries inside a container and place
it under `resources/binaries-tree`

- Build the pod VM image
Expand All @@ -44,8 +44,9 @@ cd ../../

This will build the podvm docker image. By default the image is named `quay.io/confidential-containers/podvm-docker-image`.

For quick changes you can just build the binaries of podvm components and update `./resources/binaries-tree/usr/local/bin` with the new
components and run `make image` to build a new podvm image.
For quick changes you can just build the binaries of podvm components and
update `./resources/binaries-tree/usr/local/bin` with the new components and
run `make image` to build a new podvm image.

You can download a ready-to-use image on your worker node.

Expand All @@ -56,7 +57,6 @@ docker pull quay.io/confidential-containers/podvm-docker-image
Note that before you can spin up a pod, the podvm image must be available on the K8s worker node
with the docker engine installed.


## Build CAA container image

> **Note**: If you have made changes to the CAA code and you want to deploy those changes then follow [these instructions](https://github.com/confidential-containers/cloud-api-adaptor/blob/main/src/cloud-api-adaptor/install/README.md#building-custom-cloud-api-adaptor-image) to build the container image from the root of this repository.
Expand All @@ -65,7 +65,6 @@ with the docker engine installed.

The following [`kustomization.yaml`](../install/overlays/docker/kustomization.yaml) is used.


### Deploy CAA on the Kubernetes cluster

Run the following command to deploy CAA:
Expand All @@ -80,7 +79,82 @@ For changing the CAA image to your custom built image (eg. `quay.io/myuser/cloud
you can use the following:

```bash
kubectl set image ds/cloud-api-adaptor-daemonset -n confidential-containers-system cloud-api-adaptor-con=quay.io/myuser/cloud-api-adaptor
export CAA_IMAGE=quay.io/myuser/cloud-api-adaptor
kubectl set image ds/cloud-api-adaptor-daemonset -n confidential-containers-system cloud-api-adaptor-con="$CAA_IMAGE"
```

## Running the CAA e2e tests

### Test Prerequisites

bpradipt marked this conversation as resolved.
Show resolved Hide resolved
To run the tests, use a test system with at least 8GB RAM and 4vCPUs.
Ubuntu 22.04 has been tested. Other Linux distros should work, but it has not
been tested.

Following software prerequisites are needed on the test system:

- make
- go
- yq
- kubectl
- kind
- docker

A `prereqs.sh` helper script is available under `src/cloud-api-adaptor/docker` to install/uninstall the prerequisites.


> **Note:** If using the `prereqs.sh` helper script to install the
> prerequisites, then reload the shell to ensure new permissions
are in place to run `docker` and other commands.

### Test Execution

In order to run the tests, edit the file `src/cloud-api-adaptor/test/provisioner/docker/provision_docker.properties`
and update the `CAA_IMAGE` and `CAA_IMAGE_TAG` variables with your custom CAA image and tag.

You can run the CAA e2e [tests/e2e/README.md](../test/e2e/README.md) by running the following command:

```sh
make TEST_PODVM_IMAGE=<podvm-image> TEST_PROVISION=yes CLOUD_PROVIDER=docker TEST_PROVISION_FILE=$(pwd)/test/provisioner/docker/provision_docker.properties test-e2e
```

This will create a two node kind cluster, automatically download the pod VM
image mentioned in the `provision_docker.properties` file and run the tests. On
completion of the test, the kind cluster will be automatically deleted.

> **Note:** To overcome docker rate limiting issue or to download images from private registries,
create a `config.json` file under `/tmp` with your registry secrets.

For example:
If your docker registry user is `someuser` and password is `somepassword` then create the auth string
as shown below:

```sh
echo -n "someuser:somepassword" | base64
c29tZXVzZXI6c29tZXBhc3N3b3Jk
```

This auth string needs to be used in `/tmp/config.json` as shown below:

```sh
{
"auths": {
"https://index.docker.io/v1/": {
"auth": "c29tZXVzZXI6c29tZXBhc3N3b3Jk"
}
}
}
```

If you want to use a different location for the registry secret, then remember to update the same
in the `src/cloud-api-adaptor/docker/kind-config.yaml` file.

> **Note:** If you have executed the tests with `TEST_TEARDOWN=no`, then you'll
> need to manually delete the `kind` created cluster by running the following
> command:

```sh
kind delete cluster --name peer-pods
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it working mentioning the option of using kind delete cluster --name peer-pods to delete the cluster that was auto-created in the e2e test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster will be deleted automatically on test completion unless TEST_TEARDOWN=no. Anyways let me add a note as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that might not have worked for me then as I copied your command from the PR, but my cluster is over two hours old now from all the stopped and re-testing. I thought that I'd let the e2e process finish naturally at least once, but maybe not. It might be better to note that the e2e test using kind to create the cluster and just link to their getting start docs instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was user error. I'm just re-trying and found that the Uninstall CCRuntime CRD step has taken 5mins so far, so I probably killed it previously thinking it was stuck/finished

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe not - it failed badly, so didn't do the kind delete:

time="2024-07-01T10:10:08-07:00" level=info msg="Delete the peerpod-ctrl deployment"
FAIL	github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor/test/e2e	391.579s
FAIL
make: *** [Makefile:96: test-e2e] Error 1

I will have some more attempts to see if I can track down the issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here @stevenhorsman , the kind cluster was left behind. Maybe this is a bug on the test framework itself that if something goes wrong on teardown then it won't run until the end (and the last step is to delete the cluster?)

time="2024-07-01T19:43:00Z" level=info msg="Deleting namespace 'coco-pp-e2e-test-5a79bbe1'..."
time="2024-07-01T19:43:15Z" level=info msg="Namespace 'coco-pp-e2e-test-5a79bbe1' has been successfully deleted within 60s"
Deleting the kind cluster
Deleting cluster "kind" ...
time="2024-07-01T19:43:15Z" level=info msg="Delete the Cloud API Adaptor installation"
time="2024-07-01T19:43:15Z" level=info msg="Uninstall the cloud-api-adaptor"
time="2024-07-01T19:43:15Z" level=info msg="Uninstall CCRuntime CRD"
time="2024-07-01T19:47:27Z" level=info msg="Uninstall the controller manager"
time="2024-07-01T19:47:36Z" level=info msg="Wait for the cc-operator-controller-manager deployment be deleted\n"
time="2024-07-01T19:47:41Z" level=info msg="Delete the peerpod-ctrl deployment"
FAIL    github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor/test/e2e     1316.300s
FAIL
make: *** [Makefile:98: test-e2e] Error 1

Copy link
Member

@stevenhorsman stevenhorsman Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah,

FYI with trace on I got:

time="2024-07-02T01:37:00-07:00" level=trace msg="/usr/bin/make -C ../peerpod-ctrl undeploy, output: make[1]: Entering directory '/root/go/src/github.com/confidential-containers/cloud-api-adaptor/src/peerpod-ctrl'\n/root/go/src/github.com/confidential-containers/cloud-api-adaptor/src/peerpod-ctrl/bin/kustomize build config/default | kubectl delete --ignore-not-found=false -f -\n# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.\ncustomresourcedefinition.apiextensions.k8s.io \"peerpods.confidentialcontainers.org\" deleted\nclusterrole.rbac.authorization.k8s.io \"peerpod-ctrl-manager-role\" deleted\nclusterrole.rbac.authorization.k8s.io \"peerpod-ctrl-metrics-reader\" deleted\nclusterrole.rbac.authorization.k8s.io \"peerpod-ctrl-proxy-role\" deleted\nclusterrolebinding.rbac.authorization.k8s.io \"peerpod-ctrl-manager-rolebinding\" deleted\nclusterrolebinding.rbac.authorization.k8s.io \"peerpod-ctrl-proxy-rolebinding\" deleted\nError from server (NotFound): error when deleting \"STDIN\": namespaces \"confidential-containers-system\" not found\nError from server (NotFound): error when deleting \"STDIN\": serviceaccounts \"peerpod-ctrl-controller-manager\" not found\nError from server (NotFound): error when deleting \"STDIN\": roles.rbac.authorization.k8s.io \"peerpod-ctrl-leader-election-role\" not found\nError from server (NotFound): error when deleting \"STDIN\": rolebindings.rbac.authorization.k8s.io \"peerpod-ctrl-leader-election-rolebinding\" not found\nError from server (NotFound): error when deleting \"STDIN\": services \"peerpod-ctrl-controller-manager-metrics-service\" not found\nError from server (NotFound): error when deleting \"STDIN\": deployments.apps \"peerpod-ctrl-controller-manager\" not found\nmake[1]: *** [Makefile:182: undeploy] Error 1\nmake[1]: Leaving directory '/root/go/src/github.com/confidential-containers/cloud-api-adaptor/src/peerpod-ctrl'\n"

The key bit being:

Error from server (NotFound): error when deleting \"STDIN\": namespaces \"confidential-containers-system\" not found
Error from server (NotFound): error when deleting \"STDIN\": serviceaccounts \"peerpod-ctrl-controller-manager\" not found
Error from server (NotFound): error when deleting \"STDIN\": roles.rbac.authorization.k8s.io \"peerpod-ctrl-leader-election-role\" not found
Error from server (NotFound): error when deleting \"STDIN\": rolebindings.rbac.authorization.k8s.io \"peerpod-ctrl-leader-election-rolebinding\" not found
Error from server (NotFound): error when deleting \"STDIN\": services \"peerpod-ctrl-controller-manager-metrics-service\" not found
Error from server (NotFound): error when deleting \"STDIN\": deployments.apps \"peerpod-ctrl-controller-manager\" not found
make[1]: *** [Makefile:182: undeploy] Error 1
make[1]

So there is something wrong with the e2e tests and make -C ../peerpod-ctrl/ undeploy that we should work on, but that can be done separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #1898 for this

## Run sample application
Expand Down Expand Up @@ -151,7 +225,7 @@ nginx-dbc79c87-jt49h 1/1 Running 1 (3m22s ago) 3m29s
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e60b768b847d quay.io/confidential-containers/podvm-docker-image "/usr/local/bin/entr…" 3 minutes ago Up 3 minutes 15150/tcp podvm-nginx-dbc79c87-jt49h-b9361aef
```
```

For debugging you can use docker commands like `docker ps`, `docker logs`, `docker exec`.

Expand All @@ -161,3 +235,21 @@ For debugging you can use docker commands like `docker ps`, `docker logs`, `dock
kubectl delete deployment nginx
```

## Troubleshooting

When using `containerd` and `nydus-snapshotter` you might encounter pod creation failure due to
issues with unpacking of image. Check the `nydus-snapshotter` troubleshooting [doc](../docs/troubleshooting/nydus-snapshotter.md).
bpradipt marked this conversation as resolved.
Show resolved Hide resolved

In order to login to the worker node you can use either of the following approaches

```sh
kubectl debug node/peer-pods-worker -it --image=busybox
bpradipt marked this conversation as resolved.
Show resolved Hide resolved

# chroot /host
```

or

```sh
docker exec -it peer-pods-worker bash
bpradipt marked this conversation as resolved.
Show resolved Hide resolved
```
8 changes: 4 additions & 4 deletions src/cloud-api-adaptor/docker/image/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ ARG MINOR=11

FROM ${IMAGE}:v${VERSION}.${MINOR}

RUN apt-get update && apt-get install -y sudo && apt-get clean all
RUN apt-get update && apt-get install -y sudo libpam-systemd && apt-get clean all


RUN systemctl disable kubelet
RUN systemctl disable containerd

COPY ./resources/binaries-tree/etc/ etc/
COPY ./resources/binaries-tree/usr/ usr/
COPY ./resources/binaries-tree/pause_bundle /
COPY ./resources/binaries-tree/etc/ /etc/
COPY ./resources/binaries-tree/usr/ /usr/
COPY ./resources/binaries-tree/pause_bundle/ /pause_bundle/

RUN curl -LO https://raw.githubusercontent.com/confidential-containers/cloud-api-adaptor/main/src/cloud-api-adaptor/podvm/qcow2/misc-settings.sh

Expand Down
2 changes: 2 additions & 0 deletions src/cloud-api-adaptor/docker/image/Makefile
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
include ../../Makefile.defaults

ARCH ?= $(subst x86_64,amd64,$(shell uname -m))
BUILDER = ubuntu-binaries-builder-$(ARCH)
PODVM_IMG ?= quay.io/confidential-containers/podvm-docker-image
Expand Down
27 changes: 27 additions & 0 deletions src/cloud-api-adaptor/docker/kind-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
disableDefaultCNI: true # disable kindnet
podSubnet: 192.168.0.0/16 # set to Calico's default subnet
nodes:
- role: control-plane
# Same image version as used for pod VM base image
image: kindest/node:v1.29.4
extraMounts:
# The config.json file contains the registry secrets that you might
# need to pull images from a private registry or docker registry to avoid
# rate limiting.
- hostPath: /tmp/config.json
containerPath: /var/lib/kubelet/config.json
- role: worker
image: kindest/node:v1.29.4
extraMounts:
- hostPath: /var/run/docker.sock
containerPath: /var/run/docker.sock
- hostPath: /var/lib/docker
containerPath: /var/lib/docker
# The config.json file contains the registry secrets that you might
# need to pull images from a private registry or docker registry to avoid
# rate limiting.
- hostPath: /tmp/config.json
containerPath: /var/lib/kubelet/config.json
51 changes: 51 additions & 0 deletions src/cloud-api-adaptor/docker/kind_cluster.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/bin/bash

# Ref: https://stackoverflow.com/questions/299728/how-do-you-use-newgrp-in-a-script-then-stay-in-that-group-when-the-script-exits
newgrp docker <<EOF

# Accept two arguments: create and delete
# create: creates a kind cluster
# delete: deletes a kind cluster

CLUSTER_NAME="${CLUSTER_NAME:-peer-pods}"

if [ "$1" == "create" ]; then
# Check if kind is installed
if [ ! -x "$(command -v kind)" ]; then
echo "kind is not installed"
exit 0
fi
echo "Check if the cluster \$CLUSTER_NAME already exists"
if kind get clusters | grep -q "\$CLUSTER_NAME"; then
echo "Cluster \$CLUSTER_NAME already exists"
exit 0
fi
# Set some sysctls
# Ref: https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files
sudo sysctl fs.inotify.max_user_watches=524288
sudo sysctl fs.inotify.max_user_instances=512

# Create a kind cluster
echo "Creating a kind cluster"
kind create cluster --name "\$CLUSTER_NAME" --config kind-config.yaml || exit 1

# Deploy calico
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/calico.yaml || exit 1

exit 0
fi

if [ "$1" == "delete" ]; then
# Check if kind is installed
if [ ! -x "$(command -v kind)" ]; then
echo "kind is not installed"
exit 0
fi

# Delete the kind cluster
echo "Deleting the kind cluster"
kind delete cluster --name "\$CLUSTER_NAME" || exit 1

exit 0
fi
EOF
Loading
Loading