Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Support running with restricted PSA enforcement enabled (part 1) into release/1.2.x #2638

Conversation

hc-github-team-consul-core

Backport

This PR is auto-generated from #2572 to be assessed for backporting due to the inclusion of the label backport/1.2.x.

🚨

Warning automatic cherry-pick of commits failed. If the first commit failed,
you will see a blank no-op commit below. If at least one commit succeeded, you
will see the cherry-picked commits up to, not including, the commit where
the merge conflict occurred.

The person who merged in the original PR is:
@pglass
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.

merge conflict error: POST https://api.github.com/repos/hashicorp/consul-k8s/merges: 409 Merge conflict []

The below text is copied from the body of the original PR.


Changes proposed in this PR:

Support restricted PSA enforcement in a basic setup. This is enough to get a basic setup working and an acceptance test passing (but does not update every component).

This enables running Consul in a basic configuration with PSA enforcement set to restricted on the namespace where Consul is deployed. (This requires deploying the CNI to a different privileged namespace).

On OpenShift, we have the option to set the security context or not. If the security context is unset, then it is set automatically by OpenShift SCCs. However, we prefer to set the security context to avoid useless warnings on OpenShift and to reduce the config difference between OpenShift and plain Kube. By default, OpenShift namespaces have the audit and warn PSA labels set to restricted, so we receive pod security warnings when deploying Consul to OpenShift even though the pods will be able to run.

Helm chart changes

  • Add a helper to define a "restricted" container security context (when pod security policies are not enabled)

  • Update the following container securityContexts to use the "restricted" settings (not exhaustive)

    • gateway-cleanup-job.yaml
    • gateway-resources-job.yaml
    • gossip-encryption-autogenerate-job.yaml
    • server-acl-init-cleanup-job.yaml - only if .Values.server.containerSecurityContext.server.acl-init is unset
    • server-acl-init-job.yaml - only if .Values.server.containerSecurityContext.server.acl-init is unset
    • server-statefulset.yaml:
      • the locality-init container receives the restricted context
      • the consul container receives the restricted context only if .Values.server.containerSecurityContext.server is unset
    • tls-init-cleanup-job.yaml - only if .Values.server.containerSecurityContext.server.tls-init is unset
    • tls-init-job.yaml - only if .Values.server.containerSecurityContext.server.tls-init is unset
    • webhook-cert-manager-deployment.yaml

Acceptance test changes

  • When -enable-openshift and -enable-cni are set, configure the CNI
    settings correctly for OpenShift, which must look like:

    connectInject:
      cni:
        enabled: true
        multus: true
        cniBinDir: /var/lib/cni/bin
        cniNetDir: /etc/kubernetes/cni/net.d
    
  • Add the -enable-restricted-psa-enforcement test flag. When this is set,
    the tests assume the Consul namespace has restricted PSA enforcement enabled.
    The tests will deploy the CNI (if enabled) into the kube-system namespace.
    Compatible test cases will deploy applications outside of the Consul namespace.

  • Update the ConnectHelper to configure the NetworkAttachmentDefinition
    required to be compatible with the CNI on OpenShift.

  • Add fixtures for static-client and static-server for OpenShift. This
    is necessary because the deployment configs must reference the network
    attachment definition when using the CNI on OpenShift.

  • Update tests in the acceptance/tests/connect directory to either
    run or skip based on -enable-cni and -enable-openshift

How I've tested this PR:

  1. Run OpenShift 4.12 locally with CRC. Follow these steps, but give more memory to CRC (crc start -m 18432): https://developer.hashicorp.com/consul/tutorials/kubernetes/kubernetes-openshift-red-hat#crc-setup. You will need a non-latest version of CRC to get OpenShift 4.12. (otherwise you'll have 4.13+).
$ crc version
WARN A new version (2.23.0) has been published on https://developers.redhat.com/content-gateway/file/pub/openshift-v4/clients/crc/2.23.0/crc-macos-installer.pkg
CRC version: 2.19.0+a71226
OpenShift version: 4.12.13
Podman version: 4.4.4
$ crc setup
$ crc start -m 18432
  1. Login into OpenShift as the kubeadmin:
$ eval $(crc oc-env)
$ crc console --credentials
To login as a regular user, run 'oc login -u developer -p developer https://api.crc.testing:6443'.
To login as an admin, run 'oc login -u kubeadmin -p <password> https://api.crc.testing:6443'
$ oc login -u kubeadmin -p <password> https://api.crc.testing:6443
  1. I used this script to start the tests. Place this in the acceptance directory, and fix the CONSUL_LICENSE environment variable. You may want to increase the test timeout. The script configures three namespaces (cni, consul, app) and sets their PSA enforcement levels appropriately. Then it runs the tests
Script
#!/usr/bin/env bash

set -euo pipefail

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

export CONSUL_LICENSE=$(cat ~/.consul-ent-license)
export CONSUL_ENT_LICENSE=$CONSUL_LICENSE

# Delete any leftover acceptance test namespaces.
oc get ns | grep ^acceptance | awk '{print $1}' | xargs -n 1 oc delete project || true

function runtest() {
    set -x
    ns_base="acceptance-$1-$RANDOM"

    consul_namespace="${ns_base}-consul"
    oc new-project $consul_namespace

    kubectl label --overwrite ns $consul_namespace \
        pod-security.kubernetes.io/enforce=restricted \
        pod-security.kubernetes.io/enforce-version=v1.24

    cd "${SCRIPT_DIR}/tests/$1"
    rm -rf ./_debug
    mkdir ./_debug
    go test  -v -p 1 -timeout 15m -failfast \
        -consul-k8s-image 'ghcr.io/pglass/consul-k8s-control-plane-dev:ubi' \
        -consul-image 'hashicorp/consul-enterprise:1.16.0-ent-ubi' \
        -debug-directory ./_debug \
        -enable-enterprise \
        -kube-contexts crc-admin \
        -kube-namespaces "$consul_namespace" \
        -enable-openshift \
        -enable-transparent-proxy \
        -enable-cni \
        -enable-restricted-psa-enforcement \
        ./...
}

runtest "connect"

How I expect reviewers to test this PR:

Idk, run OpenShift and try the instructions above if you dare.

Checklist:


Overview of commits

@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/pglass/NET-185/basic-connect-test/truly-stable-perch branch 2 times, most recently from ce61785 to 65cab23 Compare July 24, 2023 16:06
@pglass pglass force-pushed the backport/pglass/NET-185/basic-connect-test/truly-stable-perch branch from 65cab23 to 799b116 Compare July 24, 2023 16:43
Support restricted PSA enforcement in a basic setup. This is enough to get a basic setup with ACLs and TLS working and an acceptance test passing (but does not update every component).

On OpenShift, we have the option to set the security context or not. If the security context is unset, then it is set automatically by OpenShift SCCs. However, we prefer to set the security context to avoid useless warnings on OpenShift and to reduce the config difference between OpenShift and plain Kube. By default, OpenShift namespaces have the audit and warn PSA labels set to restricted, so we receive pod security warnings when deploying Consul to OpenShift even though the pods will be able to run.

Helm chart changes:

* Add a helper to the helm chart to define a "restricted" container security context (when pod security policies are not enabled)
* Update the following container securityContexts to use the "restricted" settings (not exhaustive)

  - gateway-cleanup-job.yaml
  - gateway-resources-job.yaml
  - gossip-encryption-autogenerate-job.yaml
  - server-acl-init-cleanup-job.yaml - only if `.Values.server.containerSecurityContext.server.acl-init` is unset
  - server-acl-init-job.yaml - only if `.Values.server.containerSecurityContext.server.acl-init` is unset
  - server-statefulset.yaml:
     - the locality-init container receives the restricted context
     - the consul container receives the restricted context only if `.Values.server.containerSecurityContext.server` is unset
  - tls-init-cleanup-job.yaml - only if `.Values.server.containerSecurityContext.server.tls-init` is unset
  - tls-init-job.yaml - only if `.Values.server.containerSecurityContext.server.tls-init` is unset
  - webhook-cert-manager-deployment.yaml

Acceptance test changes:

* When `-enable-openshift` and `-enable-cni` are set, configure the CNI
  settings correctly for OpenShift.
* Add the `-enable-restricted-psa-enforcement` test flag. When this is set,
  the tests assume the Consul namespace has restricted PSA enforcement enabled.
  The tests will deploy the CNI (if enabled) into the `kube-system` namespace.
  Compatible test cases will deploy applications outside of the Consul namespace.
* Update the ConnectHelper to configure the NetworkAttachmentDefinition
  required to be compatible with the CNI on OpenShift.
* Add fixtures for static-client and static-server for OpenShift. This
  is necessary because the deployment configs must reference the network
  attachment definition when using the CNI on OpenShift.
* Update tests in the `acceptance/tests/connect` directory to either
  run or skip based on -enable-cni and -enable-openshift
@pglass pglass force-pushed the backport/pglass/NET-185/basic-connect-test/truly-stable-perch branch from 799b116 to 143bc59 Compare July 24, 2023 16:46
@pglass
Copy link
Contributor

pglass commented Jul 24, 2023

There are a couple of test failures that look like flakes. I tested those cases locally using the following commands, and they passed, so merging this.

make kind
cd acceptance/tests/partitions
go test ./... \
  -run 'TestPartitions_Gateway' \
  -use-kind \
  -kube-contexts="kind-dc1,kind-dc2" \
  -enable-enterprise \
  -enable-multi-cluster \
  -debug-directory=/tmp/test-results/debug \
  -consul-image=docker.mirror.hashicorp.services/hashicorppreview/consul-enterprise:1.16-dev \
  -consul-k8s-image=docker.io/hashicorppreview/consul-k8s-control-plane:1.2.0-dev-pr-143bc592cda58d218177aae70c0fbe42efbeca0f \
  -consul-dataplane-image=docker.mirror.hashicorp.services/hashicorppreview/consul-dataplane:1.2-dev \
  -enable-enterprise -enable-multi-cluster -enable-transparent-proxy

And:

make kind-cni
cd acceptance/tests/partitions
go test -v -p 1 -timeout 30m ./... \
  -run 'TestPartitions_Connect/mirror_k8s_namespaces;_ACLs_enabled' \
  -use-kind \
  -kube-contexts="kind-dc1,kind-dc2" \
  -enable-enterprise \
  -enable-multi-cluster \
  -debug-directory=/tmp/test-results/debug \
  -consul-image=docker.mirror.hashicorp.services/hashicorppreview/consul-enterprise:1.16-dev \
  -consul-k8s-image=docker.io/hashicorppreview/consul-k8s-control-plane:1.2.0-dev-pr-143bc592cda58d218177aae70c0fbe42efbeca0f \
  -consul-dataplane-image=docker.mirror.hashicorp.services/hashicorppreview/consul-dataplane:1.2-dev \
  -enable-enterprise -enable-multi-cluster -enable-transparent-proxy -enable-cni

@pglass pglass marked this pull request as ready for review July 24, 2023 18:58
@pglass pglass merged commit 0a35a9b into release/1.2.x Jul 24, 2023
3 checks passed
@pglass pglass deleted the backport/pglass/NET-185/basic-connect-test/truly-stable-perch branch July 24, 2023 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants