Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Mw/net 4260 phase 2 automate the k8s sameness tests into release/1.2.x #2680

Conversation

hc-github-team-consul-core

Backport

This PR is auto-generated from #2579 to be assessed for backporting due to the inclusion of the label backport/1.2.x.

🚨

Warning automatic cherry-pick of commits failed. If the first commit failed,
you will see a blank no-op commit below. If at least one commit succeeded, you
will see the cherry-picked commits up to, not including, the commit where
the merge conflict occurred.

The person who merged in the original PR is:
@wilkermichael
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.

merge conflict error: POST https://api.github.com/repos/hashicorp/consul-k8s/merges: 409 Merge conflict []

The below text is copied from the body of the original PR.


Note: I've broken up the PR into logical commits and the commit messages contain extra details. So the reviewer can go through this PR commit by commit to make it more manageable.

Changes proposed in this PR:

  • This PR adds acceptance tests for sameness failover
    • Tests:
      • service failover
      • prepared query failover (I also technically use DNS to test PQ failover)
      • dns failover
  • The sameness tests will only run in kind because they require 4 clusters to adequately test
  • This PR is part 1 of 2, this will, for the most part, only cover the partition failover test case. I will be adding peering scenarios in the next PR.
  • To support this PR I need to make changes to the workflow, the commit marked "drop" will be dropped before merging

How I've tested this PR:

  • Ran acceptance tests in pipeline and locally

How I expect reviewers to test this PR:
👀

Checklist:


Overview of commits

curtbushko and others added 30 commits June 7, 2023 19:51
* Add FIPS builds for linux amd64

* add version check

* fix CI labels and add local dev commands

* fix ci version tagging

* switch to ubuntu 20.04

* add CLI version tag

* add gcompat for alpine glibc cgo compatibility

* remove FIPS version check from connect-init

* address comments
- making this trigger nightly until after 1.2.0 GA
- leaving 0.49.x active until after 1.2.0 GA
* first run through, needs help

* still need to make secure pass

* left something uncommented

* it works and also cleanup

* fix acceptance tests
* [API Gateway] Add acceptance test for cluster peering

* Fix linter

* Fix random unrelated linter errors to get CI to run: revert later?

* one more linter fix to later probably revert

* more linter fixes

* Revert "more linter fixes"

This reverts commit 6210dff.

* Revert "one more linter fix to later probably revert"

This reverts commit 030c563.

* Revert "Fix random unrelated linter errors to get CI to run: revert later?"

This reverts commit fdeccab.
…ersion of kind and k8s 1.27 (#2304)

* update cloud tests to use 1.24, 1.25 and 1.26 version of kubernetes for more coverage

* updated readme for supported kubernetes versions

* added changelog
* [API Gateway] WAN Federation test and fixes

* Fix unit tests
* Fix when gateways are deleted before we get services populated into cache

* a bit of cleanup
…assConfig are obeyed (#2272)

* Add unit tests verifying that scaling parameters on GatewayClassConfig are obeyed

* Add test case for scaling w/ no min or max configured
* Rename GatewayClassController to prevent name collision

* Use gateway instead of gatewayclass in name

* Use the constant in ownership checks

* Change GatewayClass name to "consul"

* Change GatewayClass name in cases

* Change ApiGatewayClass back
* Fix SupportedKinds array to be what Conformance test expects

* Fix cert validation status condition for listeners

* Add programmed condition for listeners

* Fix unit test

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* first pass at halting: got httproute and api-gateway done

* clean up test

* Handle all set for infinite reconcile check

* Add table tests for minimal setup

* Added some odd field names to test normalization is handled correctly

* Use funky casing http routes
* Added helm inputs for managing audit logs
* Remove unwanted changes from values
* fix: use correct flag when translating namespaces

* Use non-normalized namespace when deregistering services

* Guard against namespace queries when namespaces not enabled in cache
* added imagePullPolicy for images in values.yaml

* fix: renamed pullPolicy key according to image

* fixed dafault always in tmpl

* changed structure of image in yaml

* revert changes

* added global imagePullPolicy

* fixed typo

* added changelog file
This brings consul-k8s in line with consul.
Most importantly, the backport assistant was updated to automatically assign created PRs to the author of the PR that is being backported.
* update changelog based on changes made to 1.2.x

* fixed test cases
- enterprise cases were in the OSS test cases
* trigger conformance tests nightly, squash

* remove extra line

* Update nightly-api-gateway-conformance.yml
making scripts more robust and removing changing helm chart
* Fix cache and service deletion issue

* Add comments

* add in acceptance test

* Fix indentation

* Fix unit test for deleting gateway w/ consul services

* Remove redundant service deregistration code

* Exit loop early once registration is found for service

* Fix import blocking

* Set status on pods added to test

* Apply suggestions from code review

* Reduce count of test gateways to 10 from 100

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Co-authored-by: Sarah Alsmiller <sarah.alsmiller@hashicorp.com>
* Adding support for weighted k8s service

* Adding changelog

* if per-app weight is 0 then pull the weight to 1

* Addressing review comments

* Addressing review comments

* Addressing review comments

* Comment update

* Comment update

* Parameterized table test

* Parameterized table test

* fixing linting issue

* fixing linting issue

---------

Co-authored-by: srahul3 <rahulsharma@hashicorp.com>
* Bumping go-discover to the lastest version
skpratt and others added 23 commits July 20, 2023 23:43
* added make target for checking for hashicorppreview

* added check to prepare-release make target
This is meant to solve for recurrent timeouts in several steps,
particularly `golangci-lint-control-plane` and `golang-ci-lint-cli`.

An accompanying change in `consul-k8s-workflows` should disable caching
until the (unclear) root of the issue can be resolved, or we can disable
or clear cache in a more targeted way that solves for these cases.
* Fix TestAPIGateway_GatewayClassConfig
* Remove stray files from bad merge
Support restricted PSA enforcement in a basic setup. This is enough to get a basic setup with ACLs and TLS working and an acceptance test passing (but does not update every component).

On OpenShift, we have the option to set the security context or not. If the security context is unset, then it is set automatically by OpenShift SCCs. However, we prefer to set the security context to avoid useless warnings on OpenShift and to reduce the config difference between OpenShift and plain Kube. By default, OpenShift namespaces have the audit and warn PSA labels set to restricted, so we receive pod security warnings when deploying Consul to OpenShift even though the pods will be able to run.

Helm chart changes:

* Add a helper to the helm chart to define a "restricted" container security context (when pod security policies are not enabled)
* Update the following container securityContexts to use the "restricted" settings (not exhaustive)

  - gateway-cleanup-job.yaml
  - gateway-resources-job.yaml
  - gossip-encryption-autogenerate-job.yaml
  - server-acl-init-cleanup-job.yaml - only if `.Values.server.containerSecurityContext.server.acl-init` is unset
  - server-acl-init-job.yaml - only if `.Values.server.containerSecurityContext.server.acl-init` is unset
  - server-statefulset.yaml:
     - the locality-init container receives the restricted context
     - the consul container receives the restricted context only if `.Values.server.containerSecurityContext.server` is unset
  - tls-init-cleanup-job.yaml - only if `.Values.server.containerSecurityContext.server.tls-init` is unset
  - tls-init-job.yaml - only if `.Values.server.containerSecurityContext.server.tls-init` is unset
  - webhook-cert-manager-deployment.yaml

Acceptance test changes:

* When `-enable-openshift` and `-enable-cni` are set, configure the CNI
  settings correctly for OpenShift.
* Add the `-enable-restricted-psa-enforcement` test flag. When this is set,
  the tests assume the Consul namespace has restricted PSA enforcement enabled.
  The tests will deploy the CNI (if enabled) into the `kube-system` namespace.
  Compatible test cases will deploy applications outside of the Consul namespace.
* Update the ConnectHelper to configure the NetworkAttachmentDefinition
  required to be compatible with the CNI on OpenShift.
* Add fixtures for static-client and static-server for OpenShift. This
  is necessary because the deployment configs must reference the network
  attachment definition when using the CNI on OpenShift.
* Update tests in the `acceptance/tests/connect` directory to either
  run or skip based on -enable-cni and -enable-openshift
security: Upgrade Go and net/http

Upgrade to Go 1.20.6 and `net/http` 1.12.0 to resolve CVE-2023-29406.
The consul client always logs into the local datacenter
* Add support for requestTimeout in Service Resolver spec
* preserve serviceresolvers.yaml
Preserving yaml from main, only adding requesttimeout property.
* update generated.deepcopy.go
* Use latest controller-gen to generate CRDs
---------

Co-authored-by: Ashwin Venkatesh <ashwin.what@gmail.com>
… ms (#2656)

increase timeout for acl replication to 60 seconds and poll every 500 ms
…or for apiGateway (#2597)

* Support multiline nodeSelector arg

* Support multiline service annotations arg

* Update test assertions

* Add changelog entry
- These reflect the different test cases
- sameness.yaml defines the ordered list of failovers
- static-server responds with a unique name so we can track failover order
- static-client includes both DNS and CURL in the image used so we can exec in for testing
- We do a bunch of infra setup for peering and partitions, but after the initial setup only partitions are tested
- We test service failover, dns failover and PQ failover scenarios
- The sameness tests require 4 kind clusters, so the make target will now spin up 4 kind clusters
- not all tests need 4 kind clusters, but the entire suite of tests can be run with 4
- add variable for configuring timeout
- timeout was triggering locally on intel mac machine, so this timeout should cover our devs lowest performing machines
@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/mw/NET-4260-Phase-2-Automate-the-K8s-Sameness-Tests/informally-mutual-martin branch 2 times, most recently from 46f09c4 to a59fc75 Compare July 27, 2023 20:48
@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/mw/NET-4260-Phase-2-Automate-the-K8s-Sameness-Tests/informally-mutual-martin branch from a59fc75 to 46f09c4 Compare July 27, 2023 20:48
@wilkermichael wilkermichael deleted the backport/mw/NET-4260-Phase-2-Automate-the-K8s-Sameness-Tests/informally-mutual-martin branch December 18, 2023 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.