-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provisioner: Add support to deploy kbs #1518
Provisioner: Add support to deploy kbs #1518
Conversation
0c28eb9
to
061003a
Compare
1facc6d
to
620a421
Compare
620a421
to
a5b3a8b
Compare
576e956
to
449cf8f
Compare
40e7317
to
5a7ec8c
Compare
44c1158
to
c81de03
Compare
8ad754e
to
8d621a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not having a lot of shell logic in there and more of the logic implemented programmatically, but the existing test provisioner code is written in a similar way, so I guess it's ok.
Please add a PR description w/ instructions on how to test.
d27f277
to
22d878e
Compare
I could not get any KBS deployed with the PR, am I missing something? $ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
coco-pp-e2e-test-371f1451 alpine 0/1 ContainerCreating 0 11m
coco-pp-e2e-test-371f1451 busybox-configmap-pod 0/1 ContainerCreating 0 11m
coco-pp-e2e-test-371f1451 busybox-secret-pod 0/1 ContainerCreating 0 11m
coco-pp-e2e-test-371f1451 deletion-test 0/1 ContainerCreating 0 11m
coco-pp-e2e-test-371f1451 nginx-deployment-54949bb489-ht426 0/1 ContainerCreating 0 11m
coco-pp-e2e-test-371f1451 nginx-deployment-54949bb489-p5fr7 0/1 ContainerCreating 0 11m
coco-pp-e2e-test-371f1451 simple-test 0/1 ContainerCreating 0 11m
confidential-containers-system cc-operator-controller-manager-76755f9c96-fsqj9 2/2 Running 0 15m
confidential-containers-system cc-operator-daemon-install-n4qqf 1/1 Running 0 15m
confidential-containers-system cc-operator-pre-install-daemon-h8klw 1/1 Running 0 15m
confidential-containers-system cloud-api-adaptor-daemonset-t7m6n 0/1 Running 1 (5m8s ago) 15m
confidential-containers-system peerpod-ctrl-controller-manager-5d64f7bc59-bljnt 2/2 Running 0 13m
kube-system azure-ip-masq-agent-4vhgr 1/1 Running 0 17m
kube-system azure-wi-webhook-controller-manager-64dcf47dfb-rdd69 1/1 Running 0 17m
kube-system azure-wi-webhook-controller-manager-64dcf47dfb-rs4lj 1/1 Running 0 17m
kube-system cloud-node-manager-52bh6 1/1 Running 0 17m
kube-system coredns-789789675-9f28p 1/1 Running 0 16m
kube-system coredns-789789675-prfvd 1/1 Running 0 17m
kube-system coredns-autoscaler-649b947bbd-chx6d 1/1 Running 0 17m
kube-system csi-azuredisk-node-gddvg 3/3 Running 0 17m
kube-system csi-azurefile-node-4c5fq 3/3 Running 0 17m
kube-system konnectivity-agent-5bff69dbbd-7hgvw 1/1 Running 0 17m
kube-system konnectivity-agent-5bff69dbbd-lmjfb 1/1 Running 0 17m
kube-system kube-proxy-9qgvg 1/1 Running 0 17m
kube-system metrics-server-5bd48455f4-ks69r 1/2 Running 0 16m
kube-system metrics-server-5bd48455f4-ncsgh 1/2 Running 0 16m BTW, I followed the following steps to run the e2e tests for Azure from my local machine: export AZURE_RESOURCE_GROUP="test-kartik-pr1518"
export AZURE_REGION="eastus"
az group create --name "${AZURE_RESOURCE_GROUP}" \
--location "${AZURE_REGION}"
export AZURE_SUBSCRIPTION_ID=$(az account show --query id --output tsv)
export USER_ASSIGNED_IDENTITY_NAME="caa-${AZURE_RESOURCE_GROUP}"
az identity create \
--name "${USER_ASSIGNED_IDENTITY_NAME}" \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--location "${AZURE_REGION}" \
--subscription "${AZURE_SUBSCRIPTION_ID}"
export PRINCIPAL_ID="$(az identity show \
--name "${USER_ASSIGNED_IDENTITY_NAME}" \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--subscription "${AZURE_SUBSCRIPTION_ID}" --query principalId -otsv)"
sleep 30
az role assignment create \
--role Contributor \
--assignee-object-id "${PRINCIPAL_ID}" \
--scope "/subscriptions/${AZURE_SUBSCRIPTION_ID}"
export AZURE_CLIENT_ID="$(az identity show \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--name "${USER_ASSIGNED_IDENTITY_NAME}" --query 'clientId' -otsv)"
export CLUSTER_NAME="e2e"
export AZURE_IMAGE_ID="/CommunityGalleries/cocopodvm-d0e4f35f-5530-4b9c-8596-112487cdea85/Images/podvm_image0/Versions/$(date -v -1d "+%Y.%m.%d" 2>/dev/null || date -d "yesterday" "+%Y.%m.%d")"
cat <<EOF >/tmp/provision_azure.properties
AZURE_CLIENT_ID="${AZURE_CLIENT_ID}"
AZURE_SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}"
RESOURCE_GROUP_NAME="${AZURE_RESOURCE_GROUP}"
CLUSTER_NAME="${CLUSTER_NAME}"
LOCATION="${AZURE_REGION}"
SSH_KEY_ID="id_rsa.pub"
AZURE_IMAGE_ID="${AZURE_IMAGE_ID}"
SSH_USERNAME="azureuser"
AZURE_CLI_AUTH="true"
MANAGED_IDENTITY_NAME="${USER_ASSIGNED_IDENTITY_NAME}"
KBS_IMAGE="ghcr.io/confidential-containers/staged-images/kbs"
KBS_TAG="3003ced913bf83fa11d3ef753bb621f9cd030ae8"
EOF
# Docker image for KBS
# https://github.com/confidential-containers/kbs/pkgs/container/staged-images%2Fkbs
# Now open a new terminal
export TEST_PROVISION_FILE=/tmp/provision_azure.properties
export CLOUD_PROVIDER=azure
export BUILTIN_CLOUD_PROVIDERS=azure
cd test/tools
make caa-provisioner-cli
./caa-provisioner-cli -action=provision
cd ../..
make test-e2e |
@surajssd you need to clone kbs repo to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the test instructions in the PR description, the tests did not execute:
go test -v -tags=azure -timeout 60m -count=1 ./test/e2e
time="2024-01-31T09:43:38+01:00" level=info msg="Do setup"
time="2024-01-31T09:43:38+01:00" level=info msg="Deploying kbs"
time="2024-01-31T09:43:38+01:00" level=info msg="creating key.bin"
/home/magnuskulke/tmp/cloud-api-adaptor/test/e2e
time="2024-01-31T09:43:38+01:00" level=info msg="Creating kbs install overlay"
time="2024-01-31T09:43:38+01:00" level=info msg="Customize the overlay yaml file"
time="2024-01-31T09:43:38+01:00" level=info msg="Updating kbs image with \"ghcr.io/confidential-containers/staged-images/kbs\""
time="2024-01-31T09:43:38+01:00" level=info msg="Updating CAA image tag with \"latest\""
Wait pod 'kbs-b5b86666-hncws' status for Ready
time="2024-01-31T09:43:58+01:00" level=error msg="All pods are not running: timed out waiting for the condition\n"
F0131 09:43:58.170201 87436 env.go:369] Setup failure: All pods are not running: timed out waiting for the condition
FAIL github.com/confidential-containers/cloud-api-adaptor/test/e2e 19.341s
It would be good to describe the exact values that should be in /tmp/provision_azure.properties
.
44ef1fa
to
9f20b2c
Compare
e5ec4ff
to
20029d9
Compare
test/provisioner/provision.go
Outdated
} | ||
|
||
// Replace this to use install overlay | ||
cmd := exec.Command("kubectl", "apply", "-k", "overlays") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kinds of issues? KBS has same set up as CAA. And we do install CAA using kustomize as well, isn't it?
4570436
to
0922a46
Compare
0922a46
to
4d935d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installation failed as follows:
✗ ./caa-provisioner-cli -action=provision
INFO[0000] Creating VPC...
INFO[0000] Creating Resource group test-kartik-pr1518.
INFO[0002] Successfully Created Resource group test-kartik-pr1518.
INFO[0002] Creating Cluster...
INFO[0160] Successfully created federated identity credential "e2eFederatedIdentityCredential" in resource group "test-kartik-pr1518"
INFO[0165] Successfully created federated identity credential "e2eFederatedIdentityCredential" in resource group "test-kartik-pr1518"
INFO[0166] Sync cluster kubeconfig with current config context
INFO[0167] Deploying kbs
INFO[0167] creating key.bin
/Users/suraj/temp/2024-02-Feb-20-19-09-13/cloud-api-adaptor/test/tools
INFO[0167] Creating kbs install overlay
INFO[0167] Customize the overlay yaml file
INFO[0167] Updating kbs image with "ghcr.io/confidential-containers/staged-images/kbs"
INFO[0167] Updating kbs image tag with "94a397886445ec2529513a1fdddd40801c4af143"
INFO[0167] Creating kbs install overlay
INFO[0167] Install Kbs
Wait pod 'kbs-659c77bf67-xfd84' status for Ready
pod 'kbs-659c77bf67-xfd84' is Ready
KBS Service IP: 10.0.241.140
INFO[0173] KBS PARAMS: "cc_kbc::http:10.0.241.140:8080":
INFO[0173] Install the controller manager
Wait for the cc-operator-controller-manager deployment be available
INFO[0182] Customize the overlay yaml file
INFO[0184] Install the cloud-api-adaptor
Wait for the cc-operator-daemon-install DaemonSet be available
Wait for the pod cc-operator-daemon-install-cqjjj be ready
Wait for the cloud-api-adaptor-daemonset DaemonSet be available
Wait for the pod cloud-api-adaptor-daemonset-8c2ht be ready
Wait for the kata-remote runtimeclass be created
INFO[0246] Installing peerpod-ctrl
FATA[0286] exit status 2
Steps I followed are as:
# Testing Kartik's PR: https://github.com/confidential-containers/cloud-api-adaptor/pull/1518
# Setting up the clean environment.
TEMP_LOC="$HOME/temp/$(date '+%Y-%m-%b-%d-%H-%M-%S')"
mkdir -p $TEMP_LOC
cd $TEMP_LOC
git clone git@github.com:surajssd/cloud-api-adaptor.git
pushd cloud-api-adaptor
git remote add upstream git@github.com:confidential-containers/cloud-api-adaptor.git
gh repo set-default confidential-containers/cloud-api-adaptor
gh pr checkout 1518
# Start running tests now.
export AZURE_RESOURCE_GROUP="test-kartik-pr1518"
export AZURE_REGION="eastus"
az group create --name "${AZURE_RESOURCE_GROUP}" \
--location "${AZURE_REGION}"
export AZURE_SUBSCRIPTION_ID=$(az account show --query id --output tsv)
export USER_ASSIGNED_IDENTITY_NAME="caa-${AZURE_RESOURCE_GROUP}"
az identity create \
--name "${USER_ASSIGNED_IDENTITY_NAME}" \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--location "${AZURE_REGION}" \
--subscription "${AZURE_SUBSCRIPTION_ID}"
export PRINCIPAL_ID="$(az identity show \
--name "${USER_ASSIGNED_IDENTITY_NAME}" \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--subscription "${AZURE_SUBSCRIPTION_ID}" --query principalId -otsv)"
sleep 30
az role assignment create \
--role Contributor \
--assignee-object-id "${PRINCIPAL_ID}" \
--scope "/subscriptions/${AZURE_SUBSCRIPTION_ID}"
export AZURE_CLIENT_ID="$(az identity show \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--name "${USER_ASSIGNED_IDENTITY_NAME}" --query 'clientId' -otsv)"
export CLUSTER_NAME="e2e"
export AZURE_IMAGE_ID="/CommunityGalleries/cocopodvm-d0e4f35f-5530-4b9c-8596-112487cdea85/Images/podvm_image0/Versions/2024.02.20"
# Docker image for KBS
# https://github.com/confidential-containers/kbs/pkgs/container/staged-images%2Fkbs
cat <<EOF >/tmp/provision_azure.properties
AZURE_CLIENT_ID="${AZURE_CLIENT_ID}"
AZURE_SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}"
RESOURCE_GROUP_NAME="${AZURE_RESOURCE_GROUP}"
CLUSTER_NAME="${CLUSTER_NAME}"
LOCATION="${AZURE_REGION}"
SSH_KEY_ID="id_rsa.pub"
AZURE_IMAGE_ID="${AZURE_IMAGE_ID}"
SSH_USERNAME="azureuser"
AZURE_CLI_AUTH="true"
MANAGED_IDENTITY_NAME="${USER_ASSIGNED_IDENTITY_NAME}"
KBS_IMAGE="ghcr.io/confidential-containers/staged-images/kbs"
KBS_IMAGE_TAG="94a397886445ec2529513a1fdddd40801c4af143"
EOF
touch install/overlays/azure/service-principal.env
touch install/overlays/azure/id_rsa.pub
pushd test/tools
git clone git@github.com:confidential-containers/kbs.git
# Now open a new terminal
export TEST_PROVISION_FILE=/tmp/provision_azure.properties
export CLOUD_PROVIDER=azure
export BUILTIN_CLOUD_PROVIDERS=azure
make caa-provisioner-cli
./caa-provisioner-cli -action=provision
popd
return ctx, err | ||
} | ||
|
||
if err = keyBrokerService.Deploy(ctx, cfg, props); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deployment failed for me as:
✗ ./caa-provisioner-cli -action=provision
INFO[0000] Creating VPC...
...
KBS Service IP: 10.0.241.140
INFO[0138] KBS PARAMS: "cc_kbc::http:10.0.241.140:8080":
FATA[0139] loading KV pairs: env source files: [service-principal.env]: evalsymlink failure on '/Users/suraj/temp/2024-02-Feb-20-19-09-13/cloud-api-adaptor/install/overlays/azure/service-principal.env' : lstat /Users/suraj/temp/2024-02-Feb-20-19-09-13/cloud-api-adaptor/install/overlays/azure/service-principal.env: no such file or directory
I had to run ✗ touch /Users/suraj/temp/2024-02-Feb-20-19-09-13/cloud-api-adaptor/install/overlays/azure/service-principal.env
and ✗ touch /Users/suraj/temp/2024-02-Feb-20-19-09-13/cloud-api-adaptor/install/overlays/azure/id_rsa.pub
to move further.
4d935d5
to
7f96113
Compare
@kartikjoshi21 it worked for me on Linux, it was failing on OSX. |
Fixes: confidential-containers#1471 Signed-off-by: Kartik Joshi <kartikjoshi@microsoft.com>
Fixes: confidential-containers#1471 Signed-off-by: Kartik Joshi <kartikjoshi@microsoft.com>
@kartikjoshi21 I was able to deploy the app the problem was I wasn't using the
But I could not do a key release after deploying nginx pod: # curl http://127.0.0.1:8006/cdh/resource/reponame/workload_key/key.bin
ttrpc err: Receive packet timeout Elapsed(()) There is no movement on the KBS side, my guess is that the peer pod vm couldn't even reach out to KBS. Because I see nothing changed there. In my experience using KBS service IP does not work, but pod IP has worked always. KBS logs, like it is up and running but no change: $ k -n coco-tenant logs -f kbs-5d6fddb556-4jnnt
[2024-02-27T21:10:50Z INFO kbs] Using config file /etc/kbs/kbs-config.toml
[2024-02-27T21:10:50Z WARN attestation_service::rvps] No RVPS address provided and will launch a built-in rvps
[2024-02-27T21:10:50Z INFO attestation_service::token::simple] No Token Signer key in config file, create an ephemeral key and without CA pubkey cert
[2024-02-27T21:10:50Z INFO api_server] Starting HTTP server at [0.0.0.0:8080]
[2024-02-27T21:10:50Z INFO actix_server::builder] starting 4 workers
[2024-02-27T21:10:50Z INFO actix_server::server] Tokio runtime found; starting in existing Tokio runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the nginx container that I deployed following instructions here I could not retrieve the key:
$ k exec -it nginx-587b496d6c-t5d82 bash
root@nginx-587b496d6c-t5d82:/# curl http://127.0.0.1:8006/cdh/resource/reponame/workload_key/key.bin
ttrpc err: Receive packet timeout Elapsed(())
On the peerpod VM I had following running and I see that the timeout is reported there too:
$ sudo journalctl -f | grep -v stats_container
...
Feb 27 22:59:31 podvm-nginx-587b496d6c-t5d82-41d1a45e api-server-rest[963]: root_path /cdh, url_path /resource/reponame/workload_key/key.bin
Feb 27 23:00:21 podvm-nginx-587b496d6c-t5d82-41d1a45e kata-agent[947]: [2024-02-27T23:00:21Z ERROR ttrpc::asynchronous::server] method handle /attestation_agent.AttestationAgentService/GetToken got error timed out
I manually changed the broken URL of KBS service in the configmap from:
$ k -n confidential-containers-system get cm peer-pods-cm -o yaml
apiVersion: v1
data:
AA_KBC_PARAMS: cc_kbc::http:10.0.173.3:8080
to the following:
$ k -n confidential-containers-system get cm peer-pods-cm -o yaml
apiVersion: v1
data:
AA_KBC_PARAMS: cc_kbc::http://10.0.173.3:8080
And it still failed!
Changed the IP to the pod IP instead of the service IP:
$ k -n confidential-containers-system edit cm peer-pods-cm -o yaml
apiVersion: v1
data:
AA_KBC_PARAMS: cc_kbc::http://10.244.0.13:8080
And now things work:
$ k exec -it nginx-587b496d6c-nf4x2 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@nginx-587b496d6c-nf4x2:/# curl http://127.0.0.1:8006/cdh/resource/reponame/workload_key/key.bin
This is my cluster name: e2e
Here are the updated deployment instructions:
git clone git@github.com:surajssd/cloud-api-adaptor.git
pushd cloud-api-adaptor
git remote add upstream git@github.com:confidential-containers/cloud-api-adaptor.git
gh repo set-default confidential-containers/cloud-api-adaptor
gh pr checkout 1518
# Start running tests now.
export AZURE_RESOURCE_GROUP="suraj-test-kartik-pr1518"
export AZURE_REGION="eastus"
az group create --name "${AZURE_RESOURCE_GROUP}" \
--location "${AZURE_REGION}"
export AZURE_SUBSCRIPTION_ID=$(az account show --query id --output tsv)
export USER_ASSIGNED_IDENTITY_NAME="caa-${AZURE_RESOURCE_GROUP}"
az identity create \
--name "${USER_ASSIGNED_IDENTITY_NAME}" \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--location "${AZURE_REGION}" \
--subscription "${AZURE_SUBSCRIPTION_ID}"
export PRINCIPAL_ID="$(az identity show \
--name "${USER_ASSIGNED_IDENTITY_NAME}" \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--subscription "${AZURE_SUBSCRIPTION_ID}" --query principalId -otsv)"
sleep 30
az role assignment create \
--role Contributor \
--assignee-object-id "${PRINCIPAL_ID}" \
--scope "/subscriptions/${AZURE_SUBSCRIPTION_ID}"
export AZURE_CLIENT_ID="$(az identity show \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--name "${USER_ASSIGNED_IDENTITY_NAME}" --query 'clientId' -otsv)"
export CLUSTER_NAME="e2e"
export AZURE_IMAGE_ID="/CommunityGalleries/cocopodvm-d0e4f35f-5530-4b9c-8596-112487cdea85/Images/podvm_image0/Versions/2024.02.27"
# Docker image for KBS
# https://github.com/confidential-containers/kbs/pkgs/container/staged-images%2Fkbs
cat <<EOF >/tmp/provision_azure.properties
AZURE_CLIENT_ID="${AZURE_CLIENT_ID}"
AZURE_SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}"
RESOURCE_GROUP_NAME="${AZURE_RESOURCE_GROUP}"
CLUSTER_NAME="${CLUSTER_NAME}"
LOCATION="${AZURE_REGION}"
SSH_KEY_ID="id_rsa.pub"
AZURE_IMAGE_ID="${AZURE_IMAGE_ID}"
AZURE_CLI_AUTH="true"
MANAGED_IDENTITY_NAME="${USER_ASSIGNED_IDENTITY_NAME}"
# Deploy the same one that is merged on the CAA main
KBS_IMAGE="ghcr.io/confidential-containers/key-broker-service"
KBS_IMAGE_TAG="v0.8.2"
# Get the tag from: https://quay.io/repository/confidential-containers/cloud-api-adaptor?tab=tags&tag=latest
CAA_IMAGE="quay.io/confidential-containers/cloud-api-adaptor:dev-2ab1beb12dc49ce3f0c7c4112c7be1d0a67d099f"
EOF
touch install/overlays/azure/service-principal.env
ssh-keygen -t rsa -b 4096 -f install/overlays/azure/id_rsa -N "" -C dev@coco.io
pushd test/tools
git clone git@github.com:confidential-containers/kbs.git
pushd kbs
git checkout v0.8.2
popd
# Now open a new terminal
export TEST_PROVISION_FILE=/tmp/provision_azure.properties
export CLOUD_PROVIDER=azure
export BUILTIN_CLOUD_PROVIDERS=azure
make caa-provisioner-cli
./caa-provisioner-cli -action=provision
popd
Fixes: confidential-containers#1471 Signed-off-by: Kartik Joshi <kartikjoshi@microsoft.com>
7f96113
to
06ceae0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for bearing with me @kartikjoshi21
This is great work.
Thanks for the review and help to make it work @surajssd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the latest instructions also worked for me, kbs is installed in coco-tenant
. I concur, great work! looking forward to have cc_kbc tests
k logs deploy/kbs -n coco-tenant
[2024-02-29T07:38:14Z INFO kbs] Using config file /etc/kbs/kbs-config.toml
[2024-02-29T07:38:14Z INFO api_server::attestation::coco::grpc] connect to remote AS [http://127.0.0.1:50004] with pool size 100
[2024-02-29T07:38:14Z INFO api_server] Starting HTTP server at [0.0.0.0:8080]
[2024-02-29T07:38:14Z INFO actix_server::builder] starting 4 workers
[2024-02-29T07:38:14Z INFO actix_server::server] Tokio runtime found; starting in existing Tokio runtime
Thanks Magnus, Yes will start working on that next. |
Fixes: #1471