Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add e2e tests for docker #1845

Merged
merged 8 commits into from
Jul 4, 2024
Merged

Conversation

bpradipt
Copy link
Member

@bpradipt bpradipt commented May 26, 2024

Also makes docker network and podvm image configurable to help with e2e, and some minor fixes

@bpradipt bpradipt force-pushed the docker-e2e branch 5 times, most recently from 9bb16da to 16633e7 Compare May 27, 2024 09:00
@bpradipt bpradipt marked this pull request as ready for review June 20, 2024 11:23
@bpradipt
Copy link
Member Author

bpradipt commented Jun 20, 2024

Properties file used

# Docker configs
CLUSTER_NAME="peer-pods"
DOCKER_HOST="unix:///var/run/docker.sock"
DOCKER_PODVM_IMAGE="quay.io/bpradipt/podvm-docker-image"
DOCKER_NETWORK_NAME="kind"
CAA_IMAGE="quay.io/bpradipt/cloud-api-adaptor"
CAA_IMAGE_TAG="latest"

# KBS configs
KBS_IMAGE=""
KBS_IMAGE_TAG=""

Test results

ubuntu@test-pp:~/cloud-api-adaptor/src/cloud-api-adaptor$ make TEST_PODVM_IMAGE=quay.io/bpradipt/podvm-docker-image TEST_PROVISION=yes CLOUD_PROVIDER=docker TEST_PROVISION_FILE=$(pwd)/docker/provision_docker.properties test-e2e
go test -v -tags=docker -timeout 60m -count=1 ./test/e2e
time="2024-06-20T10:53:18Z" level=info msg="Do setup"
time="2024-06-20T10:53:18Z" level=info msg="Cluster provisioning"
Docker is already installed
Check if kind is already installed
kind is already installed
Check if the cluster peer-pods already exists
Cluster peer-pods already exists
Adding worker label to nodes belonging to: peer-pods
time="2024-06-20T10:53:18Z" level=info msg="Install Cloud API Adaptor"
time="2024-06-20T10:53:18Z" level=info msg="Deploy the Cloud API Adaptor"
time="2024-06-20T10:53:18Z" level=info msg="Install the controller manager"
Wait for the cc-operator-controller-manager deployment be available
time="2024-06-20T10:53:27Z" level=info msg="Customize the overlay yaml file"
time="2024-06-20T10:53:27Z" level=info msg="Updating CAA image with \"quay.io/bpradipt/cloud-api-adaptor\""
time="2024-06-20T10:53:27Z" level=info msg="Updating CAA image tag with \"latest\""
time="2024-06-20T10:53:29Z" level=info msg="Install the cloud-api-adaptor"
Wait for the cc-operator-daemon-install DaemonSet be available
Wait for the pod cc-operator-daemon-install-pczvk be ready
Wait for the cloud-api-adaptor-daemonset DaemonSet be available
Wait for the pod cloud-api-adaptor-daemonset-f8t2j be ready
Wait for the kata-remote runtimeclass be created
time="2024-06-20T10:53:54Z" level=info msg="Installing peerpod-ctrl"
time="2024-06-20T10:53:57Z" level=info msg="Wait for the peerpod-ctrl deployment to be available"
time="2024-06-20T10:54:02Z" level=info msg="Creating namespace 'coco-pp-e2e-test-55e360fb'..."
time="2024-06-20T10:54:02Z" level=info msg="Wait for namespace 'coco-pp-e2e-test-55e360fb' be ready..."
time="2024-06-20T10:54:07Z" level=info msg="Wait for default serviceaccount in namespace 'coco-pp-e2e-test-55e360fb'..."
time="2024-06-20T10:54:07Z" level=info msg="default serviceAccount exists, namespace 'coco-pp-e2e-test-55e360fb' is ready for use"
=== RUN   TestDockerCreateSimplePod
=== RUN   TestDockerCreateSimplePod/SimplePeerPod_test
    assessment_runner.go:265: Waiting for containers in pod: simple-test are ready
=== RUN   TestDockerCreateSimplePod/SimplePeerPod_test/PodVM_is_created
    assessment_helpers.go:175: Pulled with nydus-snapshotter driver:2024/06/20 10:54:10 [adaptor/proxy]         mount_point:/run/kata-containers/d7d8472979abe9faa2a9c844ab9131af3377491083a81cc17a03050fbafbde7c/rootfs source:quay.io/prometheus/busybox:latest fstype:overlay driver:image_guest_pull
time="2024-06-20T10:54:17Z" level=info msg="Deleting pod simple-test..."
time="2024-06-20T10:54:22Z" level=info msg="Pod simple-test has been successfully deleted within 60s"
--- PASS: TestDockerCreateSimplePod (15.12s)
    --- PASS: TestDockerCreateSimplePod/SimplePeerPod_test (15.12s)
        --- PASS: TestDockerCreateSimplePod/SimplePeerPod_test/PodVM_is_created (0.06s)
=== RUN   TestDockerCreatePodWithConfigMap
=== RUN   TestDockerCreatePodWithConfigMap/ConfigMapPeerPod_test
    assessment_runner.go:265: Waiting for containers in pod: busybox-configmap-pod are ready
=== RUN   TestDockerCreatePodWithConfigMap/ConfigMapPeerPod_test/Configmap_is_created_and_contains_data
    assessment_runner.go:415: Output when execute test commands:
time="2024-06-20T10:54:37Z" level=info msg="Deleting Configmap... busybox-configmap"
time="2024-06-20T10:54:37Z" level=info msg="Deleting pod busybox-configmap-pod..."
time="2024-06-20T10:54:42Z" level=info msg="Pod busybox-configmap-pod has been successfully deleted within 60s"
--- PASS: TestDockerCreatePodWithConfigMap (20.17s)
    --- PASS: TestDockerCreatePodWithConfigMap/ConfigMapPeerPod_test (20.17s)
        --- PASS: TestDockerCreatePodWithConfigMap/ConfigMapPeerPod_test/Configmap_is_created_and_contains_data (5.11s)
=== RUN   TestDockerCreatePodWithSecret
=== RUN   TestDockerCreatePodWithSecret/SecretPeerPod_test
    assessment_runner.go:265: Waiting for containers in pod: busybox-secret-pod are ready
=== RUN   TestDockerCreatePodWithSecret/SecretPeerPod_test/Secret_has_been_created_and_contains_data
    assessment_runner.go:415: Output when execute test commands:
time="2024-06-20T10:55:02Z" level=info msg="Deleting Secret... busybox-secret"
time="2024-06-20T10:55:02Z" level=info msg="Deleting pod busybox-secret-pod..."
time="2024-06-20T10:55:07Z" level=info msg="Pod busybox-secret-pod has been successfully deleted within 60s"
--- PASS: TestDockerCreatePodWithSecret (25.66s)
    --- PASS: TestDockerCreatePodWithSecret/SecretPeerPod_test (25.66s)
        --- PASS: TestDockerCreatePodWithSecret/SecretPeerPod_test/Secret_has_been_created_and_contains_data (5.13s)
=== RUN   TestDockerCreatePeerPodContainerWithExternalIPAccess
=== RUN   TestDockerCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test
    assessment_runner.go:265: Waiting for containers in pod: busybox are ready
=== RUN   TestDockerCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test/Peer_Pod_Container_Connected_to_External_IP
    assessment_runner.go:415: Output when execute test commands:
time="2024-06-20T10:55:28Z" level=info msg="Deleting pod busybox..."
time="2024-06-20T10:55:33Z" level=info msg="Pod busybox has been successfully deleted within 60s"
--- PASS: TestDockerCreatePeerPodContainerWithExternalIPAccess (25.22s)
    --- PASS: TestDockerCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test (25.22s)
        --- PASS: TestDockerCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test/Peer_Pod_Container_Connected_to_External_IP (5.17s)
=== RUN   TestDockerCreatePeerPodWithJob
=== RUN   TestDockerCreatePeerPodWithJob/JobPeerPod_test
=== RUN   TestDockerCreatePeerPodWithJob/JobPeerPod_test/Job_has_been_created
    assessment_helpers.go:291: SUCCESS: job-pi-c4kqx - Completed - LOG: 3.14156
time="2024-06-20T10:55:43Z" level=info msg="Output Log from Pod: 3.14156"
time="2024-06-20T10:55:43Z" level=info msg="Deleting Job... job-pi"
time="2024-06-20T10:55:43Z" level=info msg="Deleting pods created by job... job-pi-c4kqx"
--- PASS: TestDockerCreatePeerPodWithJob (10.07s)
    --- PASS: TestDockerCreatePeerPodWithJob/JobPeerPod_test (10.07s)
        --- PASS: TestDockerCreatePeerPodWithJob/JobPeerPod_test/Job_has_been_created (0.02s)
=== RUN   TestDockerCreatePeerPodAndCheckUserLogs
    common_suite.go:161: Skipping Test until issue kata-containers/kata-containers#5732 is Fixed
--- SKIP: TestDockerCreatePeerPodAndCheckUserLogs (0.00s)
=== RUN   TestDockerCreatePeerPodAndCheckWorkDirLogs
=== RUN   TestDockerCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test
=== RUN   TestDockerCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test/Peer_pod_with_work_directory_has_been_created
    assessment_runner.go:362: Log output of peer pod:/other
time="2024-06-20T10:58:18Z" level=info msg="Deleting pod workdirpod..."
time="2024-06-20T10:58:23Z" level=info msg="Pod workdirpod has been successfully deleted within 60s"
--- PASS: TestDockerCreatePeerPodAndCheckWorkDirLogs (160.06s)
    --- PASS: TestDockerCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test (160.06s)
        --- PASS: TestDockerCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test/Peer_pod_with_work_directory_has_been_created (5.03s)
=== RUN   TestDockerCreatePeerPodAndCheckEnvVariableLogsWithImageOnly
=== RUN   TestDockerCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test
=== RUN   TestDockerCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test/Peer_pod_with_environmental_variables_has_been_created
    assessment_runner.go:362: Log output of peer pod:KUBERNETES_SERVICE_PORT=443
        KUBERNETES_PORT=tcp://10.96.0.1:443
        HOSTNAME=env-variable-in-image
        SHLVL=1
        HOME=/root
        KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        KUBERNETES_PORT_443_TCP_PORT=443
        KUBERNETES_PORT_443_TCP_PROTO=tcp
        KUBERNETES_SERVICE_PORT_HTTPS=443
        KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
        ISPRODUCTION=false
        KUBERNETES_SERVICE_HOST=10.96.0.1
        PWD=/
time="2024-06-20T10:58:38Z" level=info msg="Deleting pod env-variable-in-image..."
time="2024-06-20T10:58:43Z" level=info msg="Pod env-variable-in-image has been successfully deleted within 60s"
--- PASS: TestDockerCreatePeerPodAndCheckEnvVariableLogsWithImageOnly (20.06s)
    --- PASS: TestDockerCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test (20.06s)
        --- PASS: TestDockerCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test/Peer_pod_with_environmental_variables_has_been_created (5.02s)
=== RUN   TestDockerCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly
=== RUN   TestDockerCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test
=== RUN   TestDockerCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test/Peer_pod_with_environmental_variables_has_been_created
    assessment_runner.go:362: Log output of peer pod:KUBERNETES_SERVICE_PORT=443
        KUBERNETES_PORT=tcp://10.96.0.1:443
        HOSTNAME=env-variable-in-config
        SHLVL=1
        HOME=/root
        KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        KUBERNETES_PORT_443_TCP_PORT=443
        KUBERNETES_PORT_443_TCP_PROTO=tcp
        KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
        KUBERNETES_SERVICE_PORT_HTTPS=443
        ISPRODUCTION=true
        KUBERNETES_SERVICE_HOST=10.96.0.1
        PWD=/
time="2024-06-20T10:58:58Z" level=info msg="Deleting pod env-variable-in-config..."
time="2024-06-20T10:59:03Z" level=info msg="Pod env-variable-in-config has been successfully deleted within 60s"
--- PASS: TestDockerCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly (20.20s)
    --- PASS: TestDockerCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test (20.20s)
        --- PASS: TestDockerCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test/Peer_pod_with_environmental_variables_has_been_created (5.02s)
=== RUN   TestDockerCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment
=== RUN   TestDockerCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test
=== RUN   TestDockerCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test/Peer_pod_with_environmental_variables_has_been_created
    assessment_runner.go:362: Log output of peer pod:KUBERNETES_SERVICE_PORT=443
        KUBERNETES_PORT=tcp://10.96.0.1:443
        HOSTNAME=env-variable-in-both
        SHLVL=1
        HOME=/root
        KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        KUBERNETES_PORT_443_TCP_PORT=443
        KUBERNETES_PORT_443_TCP_PROTO=tcp
        KUBERNETES_SERVICE_PORT_HTTPS=443
        KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
        ISPRODUCTION=true
        KUBERNETES_SERVICE_HOST=10.96.0.1
        PWD=/
time="2024-06-20T10:59:18Z" level=info msg="Deleting pod env-variable-in-both..."
time="2024-06-20T10:59:23Z" level=info msg="Pod env-variable-in-both has been successfully deleted within 60s"
--- PASS: TestDockerCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment (20.08s)
    --- PASS: TestDockerCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test (20.08s)
        --- PASS: TestDockerCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test/Peer_pod_with_environmental_variables_has_been_created (5.02s)
=== RUN   TestDockerCreateNginxDeployment
=== RUN   TestDockerCreateNginxDeployment/Nginx_image_deployment_test
time="2024-06-20T10:59:23Z" level=info msg="Creating nginx deployment..."
time="2024-06-20T10:59:28Z" level=info msg="Current deployment available replicas: 0"
time="2024-06-20T11:05:18Z" level=info msg="nginx deployment is available now"
=== RUN   TestDockerCreateNginxDeployment/Nginx_image_deployment_test/Access_for_nginx_deployment_test
time="2024-06-20T11:05:18Z" level=info msg="Deleting webserver deployment..."
time="2024-06-20T11:05:18Z" level=info msg="Deleting deployment nginx-deployment..."
time="2024-06-20T11:05:23Z" level=info msg="Deployment nginx-deployment has been successfully deleted within 120s"
--- PASS: TestDockerCreateNginxDeployment (360.05s)
    --- PASS: TestDockerCreateNginxDeployment/Nginx_image_deployment_test (360.05s)
        --- PASS: TestDockerCreateNginxDeployment/Nginx_image_deployment_test/Access_for_nginx_deployment_test (0.01s)
=== RUN   TestDockerDeletePod
=== RUN   TestDockerDeletePod/DeletePod_test
    assessment_runner.go:265: Waiting for containers in pod: deletion-test are ready
=== RUN   TestDockerDeletePod/DeletePod_test/Deletion_complete
time="2024-06-20T11:05:33Z" level=info msg="Deleting pod deletion-test..."
time="2024-06-20T11:05:38Z" level=info msg="Pod deletion-test has been successfully deleted within 60s"
--- PASS: TestDockerDeletePod (15.05s)
    --- PASS: TestDockerDeletePod/DeletePod_test (15.05s)
        --- PASS: TestDockerDeletePod/DeletePod_test/Deletion_complete (0.01s)
=== RUN   TestDockerPodToServiceCommunication
=== RUN   TestDockerPodToServiceCommunication/TestExtraPods_test
    assessment_runner.go:265: Waiting for containers in pod: nginx are ready
time="2024-06-20T11:06:03Z" level=info msg="webserver service is available on cluster IP: 10.96.44.88"
Provision extra pod busybox    assessment_helpers.go:425: Waiting for containers in pod: busybox are ready
=== RUN   TestDockerPodToServiceCommunication/TestExtraPods_test/Failed_to_test_extra_pod.
time="2024-06-20T11:06:19Z" level=info msg="Success to access nginx service. <!DOCTYPE html>\n<html>\n<head>\n<title>Welcome to nginx!</title>\n<style>\nhtml { color-scheme: light dark; }\nbody { width: 35em; margin: 0 auto;\nfont-family: Tahoma, Verdana, Arial, sans-serif; }\n</style>\n</head>\n<body>\n<h1>Welcome to nginx!</h1>\n<p>If you see this page, the nginx web server is successfully installed and\nworking. Further configuration is required.</p>\n\n<p>For online documentation and support please refer to\n<a href=\"http://nginx.org/\">nginx.org</a>.<br/>\nCommercial support is available at\n<a href=\"http://nginx.com/\">nginx.com</a>.</p>\n\n<p><em>Thank you for using nginx.</em></p>\n</body>\n</html>\n"
    assessment_runner.go:516: Output when execute test commands:<!DOCTYPE html>
        <html>
        <head>
        <title>Welcome to nginx!</title>
        <style>
        html { color-scheme: light dark; }
        body { width: 35em; margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif; }
        </style>
        </head>
        <body>
        <h1>Welcome to nginx!</h1>
        <p>If you see this page, the nginx web server is successfully installed and
        working. Further configuration is required.</p>

        <p>For online documentation and support please refer to
        <a href="http://nginx.org/">nginx.org</a>.<br/>
        Commercial support is available at
        <a href="http://nginx.com/">nginx.com</a>.</p>

        <p><em>Thank you for using nginx.</em></p>
        </body>
        </html>
time="2024-06-20T11:06:19Z" level=info msg="Deleting pod nginx..."
time="2024-06-20T11:06:24Z" level=info msg="Pod nginx has been successfully deleted within 60s"
time="2024-06-20T11:06:24Z" level=info msg="Deleting pod busybox..."
time="2024-06-20T11:06:29Z" level=info msg="Pod busybox has been successfully deleted within 60s"
time="2024-06-20T11:06:29Z" level=info msg="Deleting Service... nginx"
--- PASS: TestDockerPodToServiceCommunication (50.32s)
    --- PASS: TestDockerPodToServiceCommunication/TestExtraPods_test (50.32s)
        --- PASS: TestDockerPodToServiceCommunication/TestExtraPods_test/Failed_to_test_extra_pod. (5.13s)
=== RUN   TestDockerPodsMTLSCommunication
=== RUN   TestDockerPodsMTLSCommunication/TestPodsMTLSCommunication_test
    assessment_runner.go:265: Waiting for containers in pod: nginx are ready
time="2024-06-20T11:06:49Z" level=info msg="webserver service is available on cluster IP: 10.96.53.33"
Provision extra pod curl
 assessment_helpers.go:425: Waiting for containers in pod: curl are ready
=== RUN   TestDockerPodsMTLSCommunication/TestPodsMTLSCommunication_test/Pods_communication_with_mTLS
time="2024-06-20T11:08:29Z" level=info msg="Success to access nginx service. <!DOCTYPE html>\n<html>\n<head>\n<title>Welcome to nginx!</title>\n<style>\nhtml { color-scheme: light dark; }\nbody { width: 35em; margin: 0 auto;\nfont-family: Tahoma, Verdana, Arial, sans-serif; }\n</style>\n</head>\n<body>\n<h1>Welcome to nginx!</h1>\n<p>If you see this page, the nginx web server is successfully installed and\nworking. Further configuration is required.</p>\n\n<p>For online documentation and support please refer to\n<a href=\"http://nginx.org/\">nginx.org</a>.<br/>\nCommercial support is available at\n<a href=\"http://nginx.com/\">nginx.com</a>.</p>\n\n<p><em>Thank you for using nginx.</em></p>\n</body>\n</html>\n"
    assessment_runner.go:516: Output when execute test commands:<!DOCTYPE html>
        <html>
        <head>
        <title>Welcome to nginx!</title>
        <style>
        html { color-scheme: light dark; }
        body { width: 35em; margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif; }
        </style>
        </head>
        <body>
        <h1>Welcome to nginx!</h1>
        <p>If you see this page, the nginx web server is successfully installed and
        working. Further configuration is required.</p>

        <p>For online documentation and support please refer to
        <a href="http://nginx.org/">nginx.org</a>.<br/>
        Commercial support is available at
        <a href="http://nginx.com/">nginx.com</a>.</p>

        <p><em>Thank you for using nginx.</em></p>
        </body>
        </html>
time="2024-06-20T11:08:29Z" level=info msg="Deleting Configmap... nginx-conf"
time="2024-06-20T11:08:29Z" level=info msg="Deleting Secret... server-certs"
time="2024-06-20T11:08:29Z" level=info msg="Deleting extra Secret... curl-certs"
time="2024-06-20T11:08:29Z" level=info msg="Deleting pod nginx..."
time="2024-06-20T11:08:34Z" level=info msg="Pod nginx has been successfully deleted within 60s"
time="2024-06-20T11:08:34Z" level=info msg="Deleting pod curl..."
time="2024-06-20T11:08:39Z" level=info msg="Pod curl has been successfully deleted within 60s"
time="2024-06-20T11:08:39Z" level=info msg="Deleting Service... nginx"
--- PASS: TestDockerPodsMTLSCommunication (130.38s)
    --- PASS: TestDockerPodsMTLSCommunication/TestPodsMTLSCommunication_test (130.38s)
        --- PASS: TestDockerPodsMTLSCommunication/TestPodsMTLSCommunication_test/Pods_communication_with_mTLS (5.17s)
=== RUN   TestDockerKbsKeyRelease
    docker_test.go:102: Skipping kbs related test as kbs is not deployed
--- SKIP: TestDockerKbsKeyRelease (0.00s)
PASS
time="2024-06-20T11:08:39Z" level=info msg="Deleting namespace 'coco-pp-e2e-test-55e360fb'..."

time="2024-06-20T11:08:49Z" level=info msg="Namespace 'coco-pp-e2e-test-55e360fb' has been successfully deleted within 60s"
Deleting the kind cluster
Deleting cluster "kind" ...
Uninstalling kind
Uninstalling Docker
Reading package lists...
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required:
  conntrack cri-tools ebtables kubernetes-cni libltdl7 libslirp0 pigz
  slirp4netns socat
Use 'sudo apt autoremove' to remove them.
The following packages will be REMOVED:
  containerd.io* docker-buildx-plugin* docker-ce* docker-ce-cli*
  docker-ce-rootless-extras* docker-compose-plugin*
0 upgraded, 0 newly installed, 6 to remove and 17 not upgraded.
After this operation, 434 MB disk space will be freed.
(Reading database ... 98637 files and directories currently installed.)
Removing docker-ce (5:26.1.3-1~ubuntu.22.04~jammy) ...
Removing containerd.io (1.6.32-1) ...
Removing docker-buildx-plugin (0.14.0-1~ubuntu.22.04~jammy) ...
Removing docker-ce-cli (5:26.1.3-1~ubuntu.22.04~jammy) ...
Removing docker-ce-rootless-extras (5:26.1.3-1~ubuntu.22.04~jammy) ...
Removing docker-compose-plugin (2.27.0-1~ubuntu.22.04~jammy) ...
Processing triggers for man-db (2.10.2-1) ...
(Reading database ... 98402 files and directories currently installed.)
Purging configuration files for docker-ce (5:26.1.3-1~ubuntu.22.04~jammy) ...
Purging configuration files for containerd.io (1.6.32-1) ...
time="2024-06-20T11:09:11Z" level=info msg="Delete the Cloud API Adaptor installation"
time="2024-06-20T11:09:11Z" level=info msg="Uninstall the cloud-api-adaptor"
ok  	github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor/test/e2e	953.256s

@bpradipt
Copy link
Member Author

I ran the e2e tests on a Ubuntu 22.04 VM with 8GB RAM and 4 vCPUs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the other provision files, they live in the test directory e.g. src/cloud-api-adaptor/test/provisioner/ibmcloud/provision_ibmcloud.properties, so it might make more sense to put this in src/cloud-api-adaptor/test/provisioner/docker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll put it under test/provisioner/docker. Sorry, I missed the convention.

@stevenhorsman
Copy link
Member

@bpradipt - I'm trying to run the e2e test and my nodes are not ready, so the install just hangs. Describing it I see:

container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

Should the kind_cluster script have sorted that, or do you know if there is some manual pre-req I've missed?

@bpradipt
Copy link
Member Author

bpradipt commented Jun 20, 2024

@bpradipt - I'm trying to run the e2e test and my nodes are not ready, so the install just hangs. Describing it I see:

container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

Should the kind_cluster script have sorted that, or do you know if there is some manual pre-req I've missed?

The kind installation script should have taken care of it. Are you trying on an existing system or new system? Any other details on the environment to help understand what's happening ?
Also can you check the o/p of kubectl get pods -A and verify if calico pods are up or not.

@stevenhorsman
Copy link
Member

stevenhorsman commented Jun 20, 2024

The kind installation script should have taken care of it. Are you trying on an existing system or new system? Any other details on the environment to help understand what's happening ?

It is a brand new VM and I pick Ubuntu 22.04 with 4 vCPUs and 8GB RAM to match your tested set-up.

It doesn't look like calico/flannel have been installed:

# kubectl get pods -A
NAMESPACE                        NAME                                              READY   STATUS    RESTARTS   AGE
confidential-containers-system   cc-operator-controller-manager-546574cf87-5m427   0/2     Pending   0          47m
kube-system                      coredns-5d78c9869d-c6n6f                          0/1     Pending   0          49m
kube-system                      coredns-5d78c9869d-pn4ts                          0/1     Pending   0          49m
kube-system                      etcd-peer-pods-control-plane                      1/1     Running   0          49m
kube-system                      kube-apiserver-peer-pods-control-plane            1/1     Running   0          49m
kube-system                      kube-controller-manager-peer-pods-control-plane   1/1     Running   0          49m
kube-system                      kube-proxy-ltvgv                                  1/1     Running   0          49m
kube-system                      kube-proxy-xffwm                                  1/1     Running   0          48m
kube-system                      kube-scheduler-peer-pods-control-plane            1/1     Running   0          49m
local-path-storage               local-path-provisioner-5b77c697fd-rpfr9           0/1     Pending   0          49m

The pending pods are due to the nodes not being ready:

# kubectl get nodes
NAME                      STATUS     ROLES           AGE   VERSION
peer-pods-control-plane   NotReady   control-plane   50m   v1.27.11
peer-pods-worker          NotReady   worker          50m   v1.27.11
# kubectl describe node peer-pods-worker
Name:               peer-pods-worker
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=peer-pods-worker
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=worker
                    node.kubernetes.io/worker=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 20 Jun 2024 05:39:31 -0700
Taints:             node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  peer-pods-worker
  AcquireTime:     <unset>
  RenewTime:       Thu, 20 Jun 2024 06:30:04 -0700
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 20 Jun 2024 06:25:39 -0700   Thu, 20 Jun 2024 05:39:31 -0700   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 20 Jun 2024 06:25:39 -0700   Thu, 20 Jun 2024 05:39:31 -0700   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 20 Jun 2024 06:25:39 -0700   Thu, 20 Jun 2024 05:39:31 -0700   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Thu, 20 Jun 2024 06:25:39 -0700   Thu, 20 Jun 2024 05:39:31 -0700   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Addresses:
  InternalIP:  172.18.0.3
  Hostname:    peer-pods-worker
Capacity:
  cpu:                4
  ephemeral-storage:  259915780Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8127940Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  259915780Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8127940Ki
  pods:               110
System Info:
  Machine ID:                 4e3be70594c240ddac4b967a2cc4500f
  System UUID:                8984a844-205f-48e7-926b-48247e972ffc
  Boot ID:                    f08205e5-ca48-4bc2-94da-8b31b73bf4f4
  Kernel Version:             5.15.0-107-generic
  OS Image:                   Debian GNU/Linux 12 (bookworm)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.7.13
  Kubelet Version:            v1.27.11
  Kube-Proxy Version:         v1.27.11
PodCIDR:                      192.168.1.0/24
PodCIDRs:                     192.168.1.0/24
ProviderID:                   kind://docker/peer-pods/peer-pods-worker
Non-terminated Pods:          (1 in total)
  Namespace                   Name                CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                ------------  ----------  ---------------  -------------  ---
  kube-system                 kube-proxy-xffwm    0 (0%)        0 (0%)      0 (0%)           0 (0%)         50m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests  Limits
  --------           --------  ------
  cpu                0 (0%)    0 (0%)
  memory             0 (0%)    0 (0%)
  ephemeral-storage  0 (0%)    0 (0%)
  hugepages-1Gi      0 (0%)    0 (0%)
  hugepages-2Mi      0 (0%)    0 (0%)
Events:
  Type    Reason                   Age                From             Message
  ----    ------                   ----               ----             -------
  Normal  Starting                 50m                kube-proxy
  Normal  Starting                 50m                kubelet          Starting kubelet.
  Normal  NodeHasSufficientMemory  50m (x2 over 50m)  kubelet          Node peer-pods-worker status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    50m (x2 over 50m)  kubelet          Node peer-pods-worker status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     50m (x2 over 50m)  kubelet          Node peer-pods-worker status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  50m                kubelet          Updated Node Allocatable limit across pods
  Normal  RegisteredNode           50m                node-controller  Node peer-pods-worker event: Registered Node peer-pods-worker in Controller

Is there any other info that might be helpful, or things I can try? Sorry I appreciate this is more of a kind issue than anything else, but I don't have much experience using it and want to try and test the e2e set-up.

@bpradipt
Copy link
Member Author

The kind installation script should have taken care of it. Are you trying on an existing system or new system? Any other details on the environment to help understand what's happening ?

It is a brand new VM and I pick Ubuntu 22.04 with 4 vCPUs and 8GB RAM to match your tested set-up.

It doesn't look like calico/flannel have been installed:

# kubectl get pods -A
NAMESPACE                        NAME                                              READY   STATUS    RESTARTS   AGE
confidential-containers-system   cc-operator-controller-manager-546574cf87-5m427   0/2     Pending   0          47m
kube-system                      coredns-5d78c9869d-c6n6f                          0/1     Pending   0          49m
kube-system                      coredns-5d78c9869d-pn4ts                          0/1     Pending   0          49m
kube-system                      etcd-peer-pods-control-plane                      1/1     Running   0          49m
kube-system                      kube-apiserver-peer-pods-control-plane            1/1     Running   0          49m
kube-system                      kube-controller-manager-peer-pods-control-plane   1/1     Running   0          49m
kube-system                      kube-proxy-ltvgv                                  1/1     Running   0          49m
kube-system                      kube-proxy-xffwm                                  1/1     Running   0          48m
kube-system                      kube-scheduler-peer-pods-control-plane            1/1     Running   0          49m
local-path-storage               local-path-provisioner-5b77c697fd-rpfr9           0/1     Pending   0          49m

For some reason calico is not installed. The following line from the kind_cluster.sh script

...
 # Deploy calico
    kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/calico.yaml || exit 1

The pending pods are due to the nodes not being ready:

# kubectl get nodes
NAME                      STATUS     ROLES           AGE   VERSION
peer-pods-control-plane   NotReady   control-plane   50m   v1.27.11
peer-pods-worker          NotReady   worker          50m   v1.27.11

kubectl describe node peer-pods-worker

Name: peer-pods-worker Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=peer-pods-worker kubernetes.io/os=linux node-role.kubernetes.io/worker=worker node.kubernetes.io/worker= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 20 Jun 2024 05:39:31 -0700 Taints: node.kubernetes.io/not-ready:NoSchedule Unschedulable: false Lease: HolderIdentity: peer-pods-worker AcquireTime: RenewTime: Thu, 20 Jun 2024 06:30:04 -0700 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message

MemoryPressure False Thu, 20 Jun 2024 06:25:39 -0700 Thu, 20 Jun 2024 05:39:31 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Thu, 20 Jun 2024 06:25:39 -0700 Thu, 20 Jun 2024 05:39:31 -0700 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Thu, 20 Jun 2024 06:25:39 -0700 Thu, 20 Jun 2024 05:39:31 -0700 KubeletHasSufficientPID kubelet has sufficient PID available Ready False Thu, 20 Jun 2024 06:25:39 -0700 Thu, 20 Jun 2024 05:39:31 -0700 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized Addresses: InternalIP: 172.18.0.3 Hostname: peer-pods-worker Capacity: cpu: 4 ephemeral-storage: 259915780Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8127940Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 259915780Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8127940Ki pods: 110 System Info: Machine ID: 4e3be70594c240ddac4b967a2cc4500f System UUID: 8984a844-205f-48e7-926b-48247e972ffc Boot ID: f08205e5-ca48-4bc2-94da-8b31b73bf4f4 Kernel Version: 5.15.0-107-generic OS Image: Debian GNU/Linux 12 (bookworm) Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.7.13 Kubelet Version: v1.27.11 Kube-Proxy Version: v1.27.11 PodCIDR: 192.168.1.0/24 PodCIDRs: 192.168.1.0/24 ProviderID: kind://docker/peer-pods/peer-pods-worker Non-terminated Pods: (1 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age

kube-system kube-proxy-xffwm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 50m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits

cpu 0 (0%) 0 (0%) memory 0 (0%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Type Reason Age From Message

Normal Starting 50m kube-proxy Normal Starting 50m kubelet Starting kubelet. Normal NodeHasSufficientMemory 50m (x2 over 50m) kubelet Node peer-pods-worker status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 50m (x2 over 50m) kubelet Node peer-pods-worker status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 50m (x2 over 50m) kubelet Node peer-pods-worker status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 50m kubelet Updated Node Allocatable limit across pods Normal RegisteredNode 50m node-controller Node peer-pods-worker event: Registered Node peer-pods-worker in Controller


Is there any other info that might be helpful, or things I can try? Sorry I appreciate this is more of a kind issue than anything else, but I don't have much experience using it and want to try and test the e2e set-up.

Nothing that I could think off now. Let me spend some time to figure out what could be causing this issue..

@bpradipt
Copy link
Member Author

@stevenhorsman I added a prereqs.sh helper script in case needed to install the required prerequisites for the tests.
Please try following the README https://github.com/confidential-containers/cloud-api-adaptor/blob/d509fccb8fd0ad2549c06b41105ef02776ed3ab6/src/cloud-api-adaptor/docker/README.md#running-the-caa-e2e-tests and see if it helps.

Example properties file to try

# Docker configs
CLUSTER_NAME="peer-pods"
DOCKER_HOST="unix:///var/run/docker.sock"
DOCKER_PODVM_IMAGE="quay.io/bpradipt/podvm-docker-image"
DOCKER_NETWORK_NAME="kind"
CAA_IMAGE="quay.io/bpradipt/cloud-api-adaptor"
CAA_IMAGE_TAG="latest"

# KBS configs
KBS_IMAGE=""
KBS_IMAGE_TAG=""

@stevenhorsman
Copy link
Member

@stevenhorsman I added a prereqs.sh helper script in case needed to install the required prerequisites for the tests. Please try following the README https://github.com/confidential-containers/cloud-api-adaptor/blob/d509fccb8fd0ad2549c06b41105ef02776ed3ab6/src/cloud-api-adaptor/docker/README.md#running-the-caa-e2e-tests and see if it helps.

Sure will do. Lots of meetings atm, but will try and get to it by EoD tomorrow

@stevenhorsman
Copy link
Member

Sure will do. Lots of meetings atm, but will try and get to it by EoD tomorrow

The pre-reqs script did the trick. I'm not sure why, but the installation worked after using that. I hit the image pull error though :(

@bpradipt bpradipt requested a review from stevenhorsman June 29, 2024 14:31
@bpradipt
Copy link
Member Author

@stevenhorsman can we move ahead with this PR? The test flakiness are not really related to the provider.

@stevenhorsman
Copy link
Member

@stevenhorsman can we move ahead with this PR? The test flakiness are not really related to the provider.

Good

@stevenhorsman can we move ahead with this PR? The test flakiness are not really related to the provider.

Yeah, I think that is fair, but we as we believe that any users will hit the failure maybe we need to add some "temporary" 🤞 doc about the problem we see and the ctr fetch required to solve it?

@bpradipt
Copy link
Member Author

bpradipt commented Jul 1, 2024

@stevenhorsman can we move ahead with this PR? The test flakiness are not really related to the provider.

Good

@stevenhorsman can we move ahead with this PR? The test flakiness are not really related to the provider.

Yeah, I think that is fair, but we as we believe that any users will hit the failure maybe we need to add some "temporary" 🤞 doc about the problem we see and the ctr fetch required to solve it?

Yeah. It should be generic doc imho as it can affect any provider, may be under troubleshooting - https://github.com/confidential-containers/cloud-api-adaptor/tree/main/src/cloud-api-adaptor/docs/troubleshooting ?

@stevenhorsman
Copy link
Member

Yeah. It should be generic doc imho as it can affect any provider, may be under troubleshooting - https://github.com/confidential-containers/cloud-api-adaptor/tree/main/src/cloud-api-adaptor/docs/troubleshooting ?

That's a good idea, we seem to be pretty guaranteed to hit that with the docker provider, so maybe linking to that section for the docker provider docs makes sense too?

@bpradipt
Copy link
Member Author

bpradipt commented Jul 1, 2024

Yeah. It should be generic doc imho as it can affect any provider, may be under troubleshooting - https://github.com/confidential-containers/cloud-api-adaptor/tree/main/src/cloud-api-adaptor/docs/troubleshooting ?

That's a good idea, we seem to be pretty guaranteed to hit that with the docker provider, so maybe linking to that section for the docker provider docs makes sense too?

@stevenhorsman done. PTAL

Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, so far. Do we plan to follow up with a workflow later to automatically test this, or is it just for manually e2e testing atm?

src/cloud-api-adaptor/docker/README.md Show resolved Hide resolved
src/cloud-api-adaptor/docker/README.md Outdated Show resolved Hide resolved

If you want to use a different location for the registry secret, then remember to update the same
in the `docker/kind-config.yaml` file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it working mentioning the option of using kind delete cluster --name peer-pods to delete the cluster that was auto-created in the e2e test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster will be deleted automatically on test completion unless TEST_TEARDOWN=no. Anyways let me add a note as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that might not have worked for me then as I copied your command from the PR, but my cluster is over two hours old now from all the stopped and re-testing. I thought that I'd let the e2e process finish naturally at least once, but maybe not. It might be better to note that the e2e test using kind to create the cluster and just link to their getting start docs instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was user error. I'm just re-trying and found that the Uninstall CCRuntime CRD step has taken 5mins so far, so I probably killed it previously thinking it was stuck/finished

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe not - it failed badly, so didn't do the kind delete:

time="2024-07-01T10:10:08-07:00" level=info msg="Delete the peerpod-ctrl deployment"
FAIL	github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor/test/e2e	391.579s
FAIL
make: *** [Makefile:96: test-e2e] Error 1

I will have some more attempts to see if I can track down the issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here @stevenhorsman , the kind cluster was left behind. Maybe this is a bug on the test framework itself that if something goes wrong on teardown then it won't run until the end (and the last step is to delete the cluster?)

time="2024-07-01T19:43:00Z" level=info msg="Deleting namespace 'coco-pp-e2e-test-5a79bbe1'..."
time="2024-07-01T19:43:15Z" level=info msg="Namespace 'coco-pp-e2e-test-5a79bbe1' has been successfully deleted within 60s"
Deleting the kind cluster
Deleting cluster "kind" ...
time="2024-07-01T19:43:15Z" level=info msg="Delete the Cloud API Adaptor installation"
time="2024-07-01T19:43:15Z" level=info msg="Uninstall the cloud-api-adaptor"
time="2024-07-01T19:43:15Z" level=info msg="Uninstall CCRuntime CRD"
time="2024-07-01T19:47:27Z" level=info msg="Uninstall the controller manager"
time="2024-07-01T19:47:36Z" level=info msg="Wait for the cc-operator-controller-manager deployment be deleted\n"
time="2024-07-01T19:47:41Z" level=info msg="Delete the peerpod-ctrl deployment"
FAIL    github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor/test/e2e     1316.300s
FAIL
make: *** [Makefile:98: test-e2e] Error 1

Copy link
Member

@stevenhorsman stevenhorsman Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah,

FYI with trace on I got:

time="2024-07-02T01:37:00-07:00" level=trace msg="/usr/bin/make -C ../peerpod-ctrl undeploy, output: make[1]: Entering directory '/root/go/src/github.com/confidential-containers/cloud-api-adaptor/src/peerpod-ctrl'\n/root/go/src/github.com/confidential-containers/cloud-api-adaptor/src/peerpod-ctrl/bin/kustomize build config/default | kubectl delete --ignore-not-found=false -f -\n# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.\ncustomresourcedefinition.apiextensions.k8s.io \"peerpods.confidentialcontainers.org\" deleted\nclusterrole.rbac.authorization.k8s.io \"peerpod-ctrl-manager-role\" deleted\nclusterrole.rbac.authorization.k8s.io \"peerpod-ctrl-metrics-reader\" deleted\nclusterrole.rbac.authorization.k8s.io \"peerpod-ctrl-proxy-role\" deleted\nclusterrolebinding.rbac.authorization.k8s.io \"peerpod-ctrl-manager-rolebinding\" deleted\nclusterrolebinding.rbac.authorization.k8s.io \"peerpod-ctrl-proxy-rolebinding\" deleted\nError from server (NotFound): error when deleting \"STDIN\": namespaces \"confidential-containers-system\" not found\nError from server (NotFound): error when deleting \"STDIN\": serviceaccounts \"peerpod-ctrl-controller-manager\" not found\nError from server (NotFound): error when deleting \"STDIN\": roles.rbac.authorization.k8s.io \"peerpod-ctrl-leader-election-role\" not found\nError from server (NotFound): error when deleting \"STDIN\": rolebindings.rbac.authorization.k8s.io \"peerpod-ctrl-leader-election-rolebinding\" not found\nError from server (NotFound): error when deleting \"STDIN\": services \"peerpod-ctrl-controller-manager-metrics-service\" not found\nError from server (NotFound): error when deleting \"STDIN\": deployments.apps \"peerpod-ctrl-controller-manager\" not found\nmake[1]: *** [Makefile:182: undeploy] Error 1\nmake[1]: Leaving directory '/root/go/src/github.com/confidential-containers/cloud-api-adaptor/src/peerpod-ctrl'\n"

The key bit being:

Error from server (NotFound): error when deleting \"STDIN\": namespaces \"confidential-containers-system\" not found
Error from server (NotFound): error when deleting \"STDIN\": serviceaccounts \"peerpod-ctrl-controller-manager\" not found
Error from server (NotFound): error when deleting \"STDIN\": roles.rbac.authorization.k8s.io \"peerpod-ctrl-leader-election-role\" not found
Error from server (NotFound): error when deleting \"STDIN\": rolebindings.rbac.authorization.k8s.io \"peerpod-ctrl-leader-election-rolebinding\" not found
Error from server (NotFound): error when deleting \"STDIN\": services \"peerpod-ctrl-controller-manager-metrics-service\" not found
Error from server (NotFound): error when deleting \"STDIN\": deployments.apps \"peerpod-ctrl-controller-manager\" not found
make[1]: *** [Makefile:182: undeploy] Error 1
make[1]

So there is something wrong with the e2e tests and make -C ../peerpod-ctrl/ undeploy that we should work on, but that can be done separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #1898 for this

src/cloud-api-adaptor/docker/kind-config.yaml Outdated Show resolved Hide resolved
src/cloud-api-adaptor/docker/.gitignore Outdated Show resolved Hide resolved
src/cloud-api-adaptor/docker/README.md Show resolved Hide resolved
src/cloud-api-adaptor/docker/prereqs.sh Outdated Show resolved Hide resolved
This will create a two node kind cluster, automatically download the pod VM image mentioned in the `provision_docker.properties`
file and run the tests.

Note: To overcome docker rate limiting issue or to download images from private registries,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit this issue with the kind pods, so set up the config.json as described, but then hit a similar issue in the guest pull:

Error: failed to create containerd task: failed to create shim task: failed to pull manifest Registry error: url https://index.docker.io/v2/library/nginx/manifests/latest, envelope: OCI API errors: [OCI API error: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit]: unknown

Maybe we should consider trying to switch away from docker.io based images in our test code if there is a different mirror of nginx available? Kata uses quay.io/sjenning/nginx:1.15-alpine, but that is pretty old

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using docker images will most likely always hit rate limit for e2e due to multiple runs. Using your personal login might help but again it depends on which plan you have.
Overall switching to images in either quay or github itself is a better alternative for reliable testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - I've created #1900 to try and help with this

@stevenhorsman
Copy link
Member

@wainersm - I think you are falling into the ctr fetch issue? See https://github.com/confidential-containers/cloud-api-adaptor/blob/18d251f6c749f02635522f728465fcd9fd37c577/src/cloud-api-adaptor/docs/troubleshooting/nydus-snapshotter.md for more info
And as I commented above it is very annoying, so has taken me over 2 hours to run through all the e2e tests for this!

@bpradipt
Copy link
Member Author

bpradipt commented Jul 1, 2024

A few comments, so far. Do we plan to follow up with a workflow later to automatically test this, or is it just for manually e2e testing atm?

Manually for now since unless the image-rs issue is fixed there is no point in running this automatically imho.

@bpradipt
Copy link
Member Author

bpradipt commented Jul 2, 2024

@wainersm - I think you are falling into the ctr fetch issue? See https://github.com/confidential-containers/cloud-api-adaptor/blob/18d251f6c749f02635522f728465fcd9fd37c577/src/cloud-api-adaptor/docs/troubleshooting/nydus-snapshotter.md for more info And as I commented above it is very annoying, so has taken me over 2 hours to run through all the e2e tests for this!

May be providing a script to automatically fetch all these using ctr fetch could a good addition to troubleshooting guide.

Also looking at the e2e code, I think we can parameterise the images used. Then in the long term the different pod manifests can be kept as yamls in the test folder for easier modification.

@bpradipt bpradipt force-pushed the docker-e2e branch 2 times, most recently from 5c823d4 to 2db1e24 Compare July 2, 2024 05:57
@bpradipt
Copy link
Member Author

bpradipt commented Jul 2, 2024

@wainersm - I think you are falling into the ctr fetch issue? See https://github.com/confidential-containers/cloud-api-adaptor/blob/18d251f6c749f02635522f728465fcd9fd37c577/src/cloud-api-adaptor/docs/troubleshooting/nydus-snapshotter.md for more info And as I commented above it is very annoying, so has taken me over 2 hours to run through all the e2e tests for this!

May be providing a script to automatically fetch all these using ctr fetch could a good addition to troubleshooting guide.

Also looking at the e2e code, I think we can parameterise the images used. Then in the long term the different pod manifests can be kept as yamls in the test folder for easier modification.

I have added a simple inline script to download the images if needed

images=(
  "quay.io/prometheus/busybox:latest"
  "quay.io/confidential-containers/test-images:testworkdir"
  "docker.io/library/nginx:latest"
  "docker.io/curlimages/curl:8.4.0"
)

# Loop through each image in the list
# and download the image
for image in "${images[@]}"; do
    ctr -n k8s.io content fetch $image
done

bpradipt added 7 commits July 2, 2024 16:11
By default quay.io/confidential-containers/podvm-docker-image is used
as the podvm image and "bridge" as the docker network. The "bridge"
network is created by default during docker daemon initialisation.

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Makefile.defaults were not included

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
pause_bundle was not copied to /pause_bundle.
Also make the destination paths unambiguous to indicate
its related to root (/)

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
The files under image/resources are generated as part of build
and used for podvm image creation. These should be ignored from git

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
libpam-systemd is required to enable d-bus connection.
Otherwise following error will be thrown by kata-agent
CreateContainer failed with error: rpc error: code = Internal desc = Establishing a D-Bus connection
Caused by:
    0: I/O error: No such file or directory (os error 2)
    1: No such file or directory (os error 2)

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Change it to minimum - 1.44

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Simple fixes for typos and formatting.
These were found when going through the README to understand KBS tests

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Initial framework to run e2e tests for docker provider

The tests requires the following prerequisites:
make
go
yq
kubectl
kind
docker

A script prereqs.sh is provided to (un)install the prerequisites

As part of provisioning, it creates a 2 node kind cluster
and then runs the tests.

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
@bpradipt
Copy link
Member Author

bpradipt commented Jul 2, 2024

@stevenhorsman here are results of kbs test runs

make TEST_PODVM_IMAGE=quay.io/bpradipt/podvm-docker-image TEST_PROVISION=yes CLOUD_PROVIDER=docker TEST_PROVISION_FILE=$(pwd)/test/provisioner/docker/provision_docker.properties DEPLOY_KBS=yes test-e2e

[snip]
time="2024-07-02T14:48:37Z" level=info msg="Pod curl has been successfully deleted within 60s"
time="2024-07-02T14:48:37Z" level=info msg="Deleting Service... nginx"
--- PASS: TestDockerPodsMTLSCommunication (50.38s)
    --- PASS: TestDockerPodsMTLSCommunication/TestPodsMTLSCommunication_test (50.38s)
        --- PASS: TestDockerPodsMTLSCommunication/TestPodsMTLSCommunication_test/Pods_communication_with_mTLS (5.16s)
=== RUN   TestDockerKbsKeyRelease
time="2024-07-02T14:48:37Z" level=info msg="EnableKbsCustomizedPolicy: ../../kbs/sample_policies/deny_all.rego"
=== PAUSE TestDockerKbsKeyRelease
=== CONT  TestDockerKbsKeyRelease
time="2024-07-02T14:48:37Z" level=info msg="Do test kbs key release failure case"
=== RUN   TestDockerKbsKeyRelease/DoTestKbsKeyReleaseForFailure_test
    assessment_runner.go:265: Waiting for containers in pod: curl-failure are ready
=== RUN   TestDockerKbsKeyRelease/DoTestKbsKeyReleaseForFailure_test/Kbs_key_release_is_failed
time="2024-07-02T14:48:58Z" level=info msg="Pass failure case as: rpc status: Status { code: INTERNAL, message: \"[CDH] [ERROR]: Get Resource failed\", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }"
    assessment_runner.go:415: Output when execute test commands:rpc status: Status { code: INTERNAL, message: "[CDH] [ERROR]: Get Resource failed", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }
time="2024-07-02T14:48:58Z" level=info msg="Deleting pod curl-failure..."
time="2024-07-02T14:49:03Z" level=info msg="Pod curl-failure has been successfully deleted within 60s"
time="2024-07-02T14:49:03Z" level=info msg="EnableKbsCustomizedPolicy: ../../kbs/sample_policies/allow_all.rego"
time="2024-07-02T14:49:03Z" level=info msg="Do test kbs key release"
=== RUN   TestDockerKbsKeyRelease/KbsKeyReleasePod_test
    assessment_runner.go:265: Waiting for containers in pod: busybox-wget are ready
=== RUN   TestDockerKbsKeyRelease/KbsKeyReleasePod_test/Kbs_key_release_is_successful
time="2024-07-02T14:49:19Z" level=info msg="Success to get key.bin This is my cluster name: peer-pods"
    assessment_runner.go:415: Output when execute test commands:This is my cluster name: peer-pods
time="2024-07-02T14:49:19Z" level=info msg="Deleting pod busybox-wget..."
time="2024-07-02T14:49:24Z" level=info msg="Pod busybox-wget has been successfully deleted within 60s"
--- PASS: TestDockerKbsKeyRelease (47.15s)
    --- PASS: TestDockerKbsKeyRelease/DoTestKbsKeyReleaseForFailure_test (26.48s)
        --- PASS: TestDockerKbsKeyRelease/DoTestKbsKeyReleaseForFailure_test/Kbs_key_release_is_failed (6.43s)
    --- PASS: TestDockerKbsKeyRelease/KbsKeyReleasePod_test (20.65s)
        --- PASS: TestDockerKbsKeyRelease/KbsKeyReleasePod_test/Kbs_key_release_is_successful (5.61s)
PASS
time="2024-07-02T14:49:24Z" level=info msg="Deleting namespace 'coco-pp-e2e-test-2c7b47f1'..."
time="2024-07-02T14:49:34Z" level=info msg="Namespace 'coco-pp-e2e-test-2c7b47f1' has been successfully deleted within 60s"
Deleting the kind cluster
Deleting cluster "peer-pods" ...
Deleted nodes: ["peer-pods-worker" "peer-pods-control-plane"]
time="2024-07-02T14:49:37Z" level=info msg="Delete the Cloud API Adaptor installation"
time="2024-07-02T14:49:37Z" level=info msg="Uninstall the cloud-api-adaptor"
ok      github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor/test/e2e     434.886s

provision_docker.properties used

# Docker configs
CLUSTER_NAME="peer-pods"
DOCKER_HOST="unix:///var/run/docker.sock"
DOCKER_PODVM_IMAGE="quay.io/bpradipt/podvm-docker-image"
DOCKER_NETWORK_NAME="kind"
CAA_IMAGE="quay.io/bpradipt/cloud-api-adaptor"
CAA_IMAGE_TAG="latest"

# KBS configs
KBS_IMAGE="ghcr.io/confidential-containers/staged-images/kbs"
KBS_IMAGE_TAG="dc01f454264fb4350e5f69eba05683a9a1882c41"

Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the updates @bpradipt. I think that given where we are with the pull image challenges this is good enough to merge. I'm assuming you are happy to wait until post the alpha1 release?

@bpradipt
Copy link
Member Author

bpradipt commented Jul 2, 2024

Thanks for all the updates @bpradipt. I think that given where we are with the pull image challenges this is good enough to merge. I'm assuming you are happy to wait until post the alpha1 release?

Yes of course :-)

@bpradipt bpradipt requested a review from huoqifeng July 4, 2024 04:08
Copy link
Contributor

@snir911 snir911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, thanks!

@bpradipt bpradipt merged commit d84e476 into confidential-containers:main Jul 4, 2024
18 checks passed
@bpradipt bpradipt deleted the docker-e2e branch July 4, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants