Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap gossip encryption with Vault #811

Merged
merged 22 commits into from
Nov 12, 2021
Merged

Conversation

kschoche
Copy link
Contributor

@kschoche kschoche commented Oct 28, 2021

Changes proposed in this PR:

  • remove the extra logic in vault_cluster for waiting for pods as this is no longer needed now that we're using vault dev-mode (thx @ndhanushkodi)
  • Add a new Stanza in values.yaml : global.secretsBackend which holds the Roles and has some light documentation on its usage (to be improved yet)
  • Add logic to bootstrap gossip encryption key to the server-statefulset+client-daemonset with Vault annotations.
  • Add unit and acceptance tests to validate above logic.

How I've tested this PR:

  1. Acceptance tests are added which bootstrap consul using a gossip encryption key
  2. bats tests

How I expect reviewers to test this PR:
Code Review + run the acceptance tests/bats tests.

Checklist:

  • Tests added
  • CHANGELOG entry added

    HashiCorp engineers only, community PRs should not add a changelog entry.
    Entries should use present tense (e.g. Add support for...)

@kschoche kschoche added area/chart-only Related to changes that simply require yaml Helm chart changes, e.g. exposing a new field vault labels Oct 28, 2021
@kschoche kschoche self-assigned this Oct 28, 2021
@kschoche kschoche requested review from a team, ishustava, sadjamz and t-eckert and removed request for a team November 1, 2021 14:32
@kschoche kschoche marked this pull request as ready for review November 1, 2021 14:32
Copy link
Contributor

@t-eckert t-eckert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Kyle, very cool.

acceptance/framework/vault/vault_cluster.go Outdated Show resolved Hide resolved
acceptance/tests/vault/vault_test.go Outdated Show resolved Hide resolved
charts/consul/templates/client-daemonset.yaml Outdated Show resolved Hide resolved
charts/consul/values.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@ishustava ishustava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this, Kyle! I'm still reviewing and finishing up some testing but wanted to leave the comments I have so far. Most of them are small edits, but there are a couple of questions inline as well, plus two comments below.

  1. When I ran TestVault_BootstrapConsulGossipEncryptionKey test locally, it did not pass with vault-agent-init for the server erroring out with 2021-11-04T22:33:49.427Z [ERROR] auth.handler: error authenticating: error="context deadline exceeded" backoff=9.8s and because of it the server-acl-init job could never finish.

  2. I also have a larger UX question. I find it a bit counter-intuitive the way you need to set secretKey for a secret in vault that makes it inconsistent with how we do it in k8s.

    For example, in Kubernetes when I create a secret with kubectl create secret generic foo --from-literal=key=bar, I'd then set secretName to foo and secretKey to key. The same thing is quite different in Vault. If I create a similar secret in vault with vault kv put internal/foo key="bar", then I would need to set secretKey to .Data.data.key which is kind of confusing. I know this is how vault injector works under the hood, but I wonder if we should make the UX consistent since the fact that we're using vault injector is an implementation detail and the user should not care about it.

acceptance/tests/vault/vault_test.go Outdated Show resolved Hide resolved
acceptance/tests/vault/vault_test.go Show resolved Hide resolved
charts/consul/templates/client-daemonset.yaml Outdated Show resolved Hide resolved
charts/consul/templates/client-daemonset.yaml Outdated Show resolved Hide resolved
charts/consul/templates/client-daemonset.yaml Outdated Show resolved Hide resolved
Comment on lines 126 to 127
# Enabling the Vault secrets backend will cause Kubernetes secrets to no longer be used for the following Consul secrets:
# - gossip encryption key defined by `global.gossipEncryption.secretName/Key`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this description more generic? I think this will a bit be hard to maintain going forward if for every secret we're enabling for vault, we'd also need to update this description

charts/consul/values.yaml Outdated Show resolved Hide resolved
charts/consul/values.yaml Outdated Show resolved Hide resolved
acceptance/framework/vault/vault_cluster.go Outdated Show resolved Hide resolved
acceptance/framework/vault/vault_cluster.go Outdated Show resolved Hide resolved
@kschoche
Copy link
Contributor Author

Great work on this, Kyle! I'm still reviewing and finishing up some testing but wanted to leave the comments I have so far. Most of them are small edits, but there are a couple of questions inline as well, plus two comments below.

  1. When I ran TestVault_BootstrapConsulGossipEncryptionKey test locally, it did not pass with vault-agent-init for the server erroring out with 2021-11-04T22:33:49.427Z [ERROR] auth.handler: error authenticating: error="context deadline exceeded" backoff=9.8s and because of it the server-acl-init job could never finish.
  2. I also have a larger UX question. I find it a bit counter-intuitive the way you need to set secretKey for a secret in vault that makes it inconsistent with how we do it in k8s.
    For example, in Kubernetes when I create a secret with kubectl create secret generic foo --from-literal=key=bar, I'd then set secretName to foo and secretKey to key. The same thing is quite different in Vault. If I create a similar secret in vault with vault kv put internal/foo key="bar", then I would need to set secretKey to .Data.data.key which is kind of confusing. I know this is how vault injector works under the hood, but I wonder if we should make the UX consistent since the fact that we're using vault injector is an implementation detail and the user should not care about it.

Hi @ishustava thanks for the awesome/helpful review!
Your question number 1 was a bug in Vault that is fixed by modifying the issuer field of the kube auth config for vault. Once we start running against Vault 1.9 this wont be needed.
For your second question, I've updated the values.yaml file to include language which describes how the user will interact with consul and vault, in doing so I also removed the need for any vault api-specific wording in the key fields so now the user just specifies "key" and not ".Data.data.key" as this is implicit. I was not able to figure out a way to remove the implicit "data" field of the secret path, but I think this is less of an issue as the vault reference guides and docs mention that you need to add "data/" after your kv2 mount point when reading. Thoughts?

Copy link
Contributor

@ishustava ishustava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, Kyle! I think everything looks good, I've added some minor edits.

My only other concern is about the issuer field for the vault auth method (see comment inline). I don't think it's blocking since for now it's passing on kind, but I think we should fix it soon because it'll cause problems when we merge the feature branch into main and run the nightly tests.

acceptance/framework/helpers/helpers.go Outdated Show resolved Hide resolved
acceptance/framework/vault/vault_cluster.go Outdated Show resolved Hide resolved
acceptance/framework/vault/vault_cluster.go Outdated Show resolved Hide resolved
acceptance/framework/vault/vault_cluster.go Outdated Show resolved Hide resolved
@@ -158,7 +154,7 @@ func (v *VaultCluster) bootstrap(t *testing.T, ctx environment.TestContext) {
}
// We need to kubectl exec this one on the vault server:
// This is taken from https://learn.hashicorp.com/tutorials/vault/kubernetes-google-cloud-gke?in=vault/kubernetes#configure-kubernetes-authentication
cmdString := fmt.Sprintf("VAULT_TOKEN=%s vault write auth/kubernetes/config token_reviewer_jwt=\"$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\" kubernetes_host=\"https://${KUBERNETES_PORT_443_TCP_ADDR}:443\" kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt", vaultRootToken)
cmdString := fmt.Sprintf("VAULT_TOKEN=%s vault write auth/kubernetes/config issuer=\"https://kubernetes.default.svc.cluster.local\" token_reviewer_jwt=\"$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\" kubernetes_host=\"https://${KUBERNETES_PORT_443_TCP_ADDR}:443\" kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt", vaultRootToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran into something today. It seems like the issuer on gke is different (it's kubernetes/serviceaccount). After looking into it for a bit, it seems that the issue is with the version of k8s: as of 1.21+ the tokens mounted to pods have a different format (this new format would require the issuer to be specified in the vault auth method config). The more interesting part is that the jwt token mounted into your pod is now a temporary token that expires after some time and is renewed by kube periodically, while the token that's in the k8s secret still has the old format.

I think we should change this to instead use the ca cert and the jwt token from the service account token secret instead of exec'ing into the pod. Using this method might not require us to provide an issuer and should hopefully work the same across k8s versions too.

Once we do that, I'd also suggest running nightlies with this test to confirm that it works for the versions of k8s we're testing with. This can be done after this PR is merged, but since we'll run into it eventually (i.e. when the feature branch is merged), it's probably better to deal with this now rather than later.

"global.acls.manageSystemACLs": "true",
"global.tls.enabled": "true",
"global.gossipEncryption.secretName": "consul/data/secret/gossip",
"global.gossipEncryption.secretKey": "gossip",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

acceptance/tests/vault/vault_test.go Outdated Show resolved Hide resolved
{{ "{{" }}- with secret "{{ .secretName }}" -{{ "}}" }}
{{ "{{" }}- {{ printf ".Data.data.%s" .secretKey }} -{{ "}}" }}
{{ "{{" }}- end -{{ "}}" }}
{{- end -}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ this refactor! makes it look much cleaner

charts/consul/values.yaml Outdated Show resolved Hide resolved
charts/consul/values.yaml Outdated Show resolved Hide resolved
kschoche and others added 3 commits November 12, 2021 08:59
Co-authored-by: Iryna Shustava <ishustava@users.noreply.github.com>
@kschoche kschoche merged commit 80102a5 into consul-vault-base Nov 12, 2021
@kschoche kschoche deleted the gossip-boostrapping branch November 12, 2021 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/chart-only Related to changes that simply require yaml Helm chart changes, e.g. exposing a new field vault
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants