-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add e2e tests for docker #1845
Add e2e tests for docker #1845
Conversation
9bb16da
to
16633e7
Compare
Properties file used
Test results
|
I ran the e2e tests on a Ubuntu 22.04 VM with 8GB RAM and 4 vCPUs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other provision files, they live in the test directory e.g. src/cloud-api-adaptor/test/provisioner/ibmcloud/provision_ibmcloud.properties
, so it might make more sense to put this in src/cloud-api-adaptor/test/provisioner/docker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll put it under test/provisioner/docker. Sorry, I missed the convention.
@bpradipt - I'm trying to run the e2e test and my nodes are not ready, so the install just hangs. Describing it I see:
Should the kind_cluster script have sorted that, or do you know if there is some manual pre-req I've missed? |
The kind installation script should have taken care of it. Are you trying on an existing system or new system? Any other details on the environment to help understand what's happening ? |
It is a brand new VM and I pick Ubuntu 22.04 with 4 vCPUs and 8GB RAM to match your tested set-up. It doesn't look like calico/flannel have been installed:
The pending pods are due to the nodes not being ready:
Is there any other info that might be helpful, or things I can try? Sorry I appreciate this is more of a kind issue than anything else, but I don't have much experience using it and want to try and test the e2e set-up. |
For some reason calico is not installed. The following line from the kind_cluster.sh script
Nothing that I could think off now. Let me spend some time to figure out what could be causing this issue.. |
@stevenhorsman I added a Example properties file to try
|
Sure will do. Lots of meetings atm, but will try and get to it by EoD tomorrow |
The pre-reqs script did the trick. I'm not sure why, but the installation worked after using that. I hit the image pull error though :( |
@stevenhorsman can we move ahead with this PR? The test flakiness are not really related to the provider. |
Good
Yeah, I think that is fair, but we as we believe that any users will hit the failure maybe we need to add some "temporary" 🤞 doc about the problem we see and the |
Yeah. It should be generic doc imho as it can affect any provider, may be under troubleshooting - https://github.com/confidential-containers/cloud-api-adaptor/tree/main/src/cloud-api-adaptor/docs/troubleshooting ? |
That's a good idea, we seem to be pretty guaranteed to hit that with the docker provider, so maybe linking to that section for the docker provider docs makes sense too? |
@stevenhorsman done. PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, so far. Do we plan to follow up with a workflow later to automatically test this, or is it just for manually e2e testing atm?
|
||
If you want to use a different location for the registry secret, then remember to update the same | ||
in the `docker/kind-config.yaml` file. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it working mentioning the option of using kind delete cluster --name peer-pods
to delete the cluster that was auto-created in the e2e test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cluster will be deleted automatically on test completion unless TEST_TEARDOWN=no
. Anyways let me add a note as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that might not have worked for me then as I copied your command from the PR, but my cluster is over two hours old now from all the stopped and re-testing. I thought that I'd let the e2e process finish naturally at least once, but maybe not. It might be better to note that the e2e test using kind to create the cluster and just link to their getting start docs instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was user error. I'm just re-trying and found that the Uninstall CCRuntime CRD
step has taken 5mins so far, so I probably killed it previously thinking it was stuck/finished
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe not - it failed badly, so didn't do the kind delete:
time="2024-07-01T10:10:08-07:00" level=info msg="Delete the peerpod-ctrl deployment"
FAIL github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor/test/e2e 391.579s
FAIL
make: *** [Makefile:96: test-e2e] Error 1
I will have some more attempts to see if I can track down the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here @stevenhorsman , the kind cluster was left behind. Maybe this is a bug on the test framework itself that if something goes wrong on teardown then it won't run until the end (and the last step is to delete the cluster?)
time="2024-07-01T19:43:00Z" level=info msg="Deleting namespace 'coco-pp-e2e-test-5a79bbe1'..."
time="2024-07-01T19:43:15Z" level=info msg="Namespace 'coco-pp-e2e-test-5a79bbe1' has been successfully deleted within 60s"
Deleting the kind cluster
Deleting cluster "kind" ...
time="2024-07-01T19:43:15Z" level=info msg="Delete the Cloud API Adaptor installation"
time="2024-07-01T19:43:15Z" level=info msg="Uninstall the cloud-api-adaptor"
time="2024-07-01T19:43:15Z" level=info msg="Uninstall CCRuntime CRD"
time="2024-07-01T19:47:27Z" level=info msg="Uninstall the controller manager"
time="2024-07-01T19:47:36Z" level=info msg="Wait for the cc-operator-controller-manager deployment be deleted\n"
time="2024-07-01T19:47:41Z" level=info msg="Delete the peerpod-ctrl deployment"
FAIL github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor/test/e2e 1316.300s
FAIL
make: *** [Makefile:98: test-e2e] Error 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah,
FYI with trace on I got:
time="2024-07-02T01:37:00-07:00" level=trace msg="/usr/bin/make -C ../peerpod-ctrl undeploy, output: make[1]: Entering directory '/root/go/src/github.com/confidential-containers/cloud-api-adaptor/src/peerpod-ctrl'\n/root/go/src/github.com/confidential-containers/cloud-api-adaptor/src/peerpod-ctrl/bin/kustomize build config/default | kubectl delete --ignore-not-found=false -f -\n# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.\ncustomresourcedefinition.apiextensions.k8s.io \"peerpods.confidentialcontainers.org\" deleted\nclusterrole.rbac.authorization.k8s.io \"peerpod-ctrl-manager-role\" deleted\nclusterrole.rbac.authorization.k8s.io \"peerpod-ctrl-metrics-reader\" deleted\nclusterrole.rbac.authorization.k8s.io \"peerpod-ctrl-proxy-role\" deleted\nclusterrolebinding.rbac.authorization.k8s.io \"peerpod-ctrl-manager-rolebinding\" deleted\nclusterrolebinding.rbac.authorization.k8s.io \"peerpod-ctrl-proxy-rolebinding\" deleted\nError from server (NotFound): error when deleting \"STDIN\": namespaces \"confidential-containers-system\" not found\nError from server (NotFound): error when deleting \"STDIN\": serviceaccounts \"peerpod-ctrl-controller-manager\" not found\nError from server (NotFound): error when deleting \"STDIN\": roles.rbac.authorization.k8s.io \"peerpod-ctrl-leader-election-role\" not found\nError from server (NotFound): error when deleting \"STDIN\": rolebindings.rbac.authorization.k8s.io \"peerpod-ctrl-leader-election-rolebinding\" not found\nError from server (NotFound): error when deleting \"STDIN\": services \"peerpod-ctrl-controller-manager-metrics-service\" not found\nError from server (NotFound): error when deleting \"STDIN\": deployments.apps \"peerpod-ctrl-controller-manager\" not found\nmake[1]: *** [Makefile:182: undeploy] Error 1\nmake[1]: Leaving directory '/root/go/src/github.com/confidential-containers/cloud-api-adaptor/src/peerpod-ctrl'\n"
The key bit being:
Error from server (NotFound): error when deleting \"STDIN\": namespaces \"confidential-containers-system\" not found
Error from server (NotFound): error when deleting \"STDIN\": serviceaccounts \"peerpod-ctrl-controller-manager\" not found
Error from server (NotFound): error when deleting \"STDIN\": roles.rbac.authorization.k8s.io \"peerpod-ctrl-leader-election-role\" not found
Error from server (NotFound): error when deleting \"STDIN\": rolebindings.rbac.authorization.k8s.io \"peerpod-ctrl-leader-election-rolebinding\" not found
Error from server (NotFound): error when deleting \"STDIN\": services \"peerpod-ctrl-controller-manager-metrics-service\" not found
Error from server (NotFound): error when deleting \"STDIN\": deployments.apps \"peerpod-ctrl-controller-manager\" not found
make[1]: *** [Makefile:182: undeploy] Error 1
make[1]
So there is something wrong with the e2e tests and make -C ../peerpod-ctrl/ undeploy
that we should work on, but that can be done separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised #1898 for this
src/cloud-api-adaptor/install/overlays/docker/kustomization.yaml
Outdated
Show resolved
Hide resolved
This will create a two node kind cluster, automatically download the pod VM image mentioned in the `provision_docker.properties` | ||
file and run the tests. | ||
|
||
Note: To overcome docker rate limiting issue or to download images from private registries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hit this issue with the kind pods, so set up the config.json as described, but then hit a similar issue in the guest pull:
Error: failed to create containerd task: failed to create shim task: failed to pull manifest Registry error: url https://index.docker.io/v2/library/nginx/manifests/latest, envelope: OCI API errors: [OCI API error: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit]: unknown
Maybe we should consider trying to switch away from docker.io based images in our test code if there is a different mirror of nginx available? Kata uses quay.io/sjenning/nginx:1.15-alpine
, but that is pretty old
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using docker images will most likely always hit rate limit for e2e due to multiple runs. Using your personal login might help but again it depends on which plan you have.
Overall switching to images in either quay or github itself is a better alternative for reliable testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - I've created #1900 to try and help with this
@wainersm - I think you are falling into the |
Manually for now since unless the image-rs issue is fixed there is no point in running this automatically imho. |
May be providing a script to automatically fetch all these using Also looking at the e2e code, I think we can parameterise the images used. Then in the long term the different pod manifests can be kept as yamls in the test folder for easier modification. |
5c823d4
to
2db1e24
Compare
I have added a simple inline script to download the images if needed
|
By default quay.io/confidential-containers/podvm-docker-image is used as the podvm image and "bridge" as the docker network. The "bridge" network is created by default during docker daemon initialisation. Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Makefile.defaults were not included Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
pause_bundle was not copied to /pause_bundle. Also make the destination paths unambiguous to indicate its related to root (/) Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
The files under image/resources are generated as part of build and used for podvm image creation. These should be ignored from git Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
libpam-systemd is required to enable d-bus connection. Otherwise following error will be thrown by kata-agent CreateContainer failed with error: rpc error: code = Internal desc = Establishing a D-Bus connection Caused by: 0: I/O error: No such file or directory (os error 2) 1: No such file or directory (os error 2) Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Change it to minimum - 1.44 Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Simple fixes for typos and formatting. These were found when going through the README to understand KBS tests Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Initial framework to run e2e tests for docker provider The tests requires the following prerequisites: make go yq kubectl kind docker A script prereqs.sh is provided to (un)install the prerequisites As part of provisioning, it creates a 2 node kind cluster and then runs the tests. Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
@stevenhorsman here are results of kbs test runs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the updates @bpradipt. I think that given where we are with the pull image challenges this is good enough to merge. I'm assuming you are happy to wait until post the alpha1 release?
Yes of course :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, thanks!
Also makes docker network and podvm image configurable to help with e2e, and some minor fixes