Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KongIngress types for generated CRDs #1971

Merged
merged 24 commits into from
Dec 15, 2021
Merged

Add KongIngress types for generated CRDs #1971

merged 24 commits into from
Dec 15, 2021

Conversation

rainest
Copy link
Contributor

@rainest rainest commented Oct 26, 2021

What this PR does / why we need it:

  • Adds dedicated types for KongIngress fields rather than re-using the go-kong types. Transfers content from KongIngressUpstream into the kong.Upstream rather than clobbering the latter.
  • Re-enables CRD generation for all CRDs and removes the hack script to keep old CRDs around.
  • Adds annotations to fill out most of the remaining missing metadata from the original CRDs.
  • Adds an alias KongProtocol type to allow constraints on list members (Kubebuilder lacks support for doing this directly)

Two commits are unrelated to the main goal of this PR, but they were annoying the hell out of me during testing:

  • Removes a noisy, low info log line from integration tests.
  • Adds buffering config to the "default" route that we modify to create actual routes. These settings are auto-set by the admin API and not including defaults would trigger spurious deck diffs in DB-backed mode.

Which issue this PR fixes:
fixes #1778
fixes #1779

Special notes for your reviewer:
Review individual commits, there's a ton of boilerplate and identical fixes repeated ad nauseum in a few of them.

Missing healthcheck validations
This does not create an alias type for upstream healthchecks, which still use the go-kong type directly. This works since the healthcheck struct doesn't include fields we don't want users to set (name, ID, created at, etc.) like the others. This loses some validation that was present in the original CRD, but those validations seem fairly minor (common sense things like "don't set the interval between checks to a negative number").

It further added some fields we'd added in go-kong but never added to the KongIngress schema (and fixed a typo to boot!), so IMO a net win. I'm okay with leaving those minor validations out to reduce the type maintenance burden.

Upstream certificate mysteries
Edit: per the core team, no functional difference. Upstream certificates exist for users that reuse their upstreams and don't want to reconfigure certificates across a bunch of services. Irrelevant for us, since we generate upstreams automatically and do not reuse them across services.

While implementing KongIngressUpstream, I noted that we technically supported client certificates in the struct, but their actual support seems questionable. go-kong uses a complete certificate object (certificate, key, SNIs) here, but the actual admin API expectation is an ID reference to an existing certificate, so I'm unsure if that'd actually work if tried (maybe deck and dbless can handle it formatted that way and generate the reference on your behalf).

We have no mention of support for upstream client certificates in the schema or docs, so my best guess is that it was just there by virtue of the lazy type inclusion. To even try to use it as-was, you'd need to stick the complete raw certificate and key string directly into those fields, since there was no logic to perform a Secret lookup. On the offchance this did work, it'd be quite inadvisable.

Lastly, Services already support their own client certificate annotation, and it does have proper Secret lookup. I'm unsure whether setting a client cert on a service results in any functional difference versus setting it on an upstream, but I can't see where it would--both control the client certificate presented to the upstream application. Asking core team to see if anyone knows.

Ultimately, based on what I know so far, I recommend that we simply leave client certificates out of the new KongIngressUpstream and instruct users to use the Service client-certificate annotation instead.

PR Readiness Checklist:

Complete these before marking the PR as ready to review:

  • the CHANGELOG.md release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR

@rainest rainest requested a review from a team as a code owner October 26, 2021 22:52
@rainest rainest temporarily deployed to Configure ci October 26, 2021 22:52 Inactive
@rainest rainest temporarily deployed to Configure ci October 26, 2021 23:00 Inactive
@mflendrich mflendrich requested review from mflendrich and removed request for a team October 27, 2021 14:45
Copy link
Contributor

@mflendrich mflendrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spirit of the change LGTM, but the generated CRDs appear incorrect (spurious apiVersion and kind)

@rainest rainest temporarily deployed to Configure ci October 28, 2021 17:08 Inactive
@rainest rainest temporarily deployed to Configure ci October 28, 2021 17:12 Inactive
@rainest rainest temporarily deployed to Configure ci October 28, 2021 17:22 Inactive
Copy link
Contributor

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can remove the explicit preseriveUnknownFields attribute that was previous added, as per https://github.com/Kong/kubernetes-ingress-controller/blob/main/hack/generators/manifests/main.go#L30 has something changed that would make the upgrade succeed?

@rainest rainest temporarily deployed to Configure ci November 2, 2021 17:03 Inactive
@rainest rainest temporarily deployed to Configure ci November 2, 2021 17:15 Inactive
@rainest rainest temporarily deployed to Configure ci November 2, 2021 17:16 Inactive
@rainest
Copy link
Contributor Author

rainest commented Nov 2, 2021

Nope, didn't know CRD upgrades were weird like that, thought it was just something we'd added via the generate command but didn't need anymore after 0.7 removed that method of setting it. 19f810bb adds annotations instead.

@rainest rainest requested a review from shaneutt November 2, 2021 17:18
@rainest rainest temporarily deployed to Configure ci November 2, 2021 17:28 Inactive
@shaneutt
Copy link
Contributor

shaneutt commented Nov 2, 2021

Nope, didn't know CRD upgrades were weird like that, thought it was just something we'd added via the generate command but didn't need anymore after 0.7 removed that method of setting it. 19f810bb adds annotations instead.

We need to test manually upgrading these from the old ones from 1.3.x, I believe I tried with the annotation previously and it didn't work.

@rainest
Copy link
Contributor Author

rainest commented Nov 3, 2021

We need to test manually upgrading these from the old ones from 1.3.x, I believe I tried with the annotation previously and it didn't work.

Anything in particular we should be looking for? The newly-added annotation is just substituting for what we used to pass as an argument to controller-gen (CRD_OPTIONS ?= "crd:trivialVersions=true,preserveUnknownFields=false,allowDangerousTypes=true"). It's no longer available as a global option and needs to be set per-CRD, but the result is the same (the generated CRD gets preserveUnknownFields: false set explicitly).

I did a basic test configuring a cluster plugin, confirming it worked, overwriting the CRD, and confirming that the CRD apply didn't complain and that the plugin continued working:

test.txt

But I'm not sure what previous breakage was observed/what else I'd need to check.

@shaneutt
Copy link
Contributor

shaneutt commented Nov 4, 2021

We need to test manually upgrading these from the old ones from 1.3.x, I believe I tried with the annotation previously and it didn't work.

Anything in particular we should be looking for? The newly-added annotation is just substituting for what we used to pass as an argument to controller-gen (CRD_OPTIONS ?= "crd:trivialVersions=true,preserveUnknownFields=false,allowDangerousTypes=true"). It's no longer available as a global option and needs to be set per-CRD, but the result is the same (the generated CRD gets preserveUnknownFields: false set explicitly).

I did a basic test configuring a cluster plugin, confirming it worked, overwriting the CRD, and confirming that the CRD apply didn't complain and that the plugin continued working:

test.txt

But I'm not sure what previous breakage was observed/what else I'd need to check.

If you've verified that the older v1.3.x v1beta1 CRDs upgrade cleanly to these newer CRDs then that should be sufficient, it's possible that something was missed in the previous pass. We should also verify clean upgrading from the current CRDs in v2.0.x.

Ideally we should have an integration or e2e test that deploys the old CRDs, tests resources on them, upgrades to the v2.0.0 CRDs, tests again, then upgrades to these CRDs and tests once more.

@rainest rainest temporarily deployed to Configure ci December 1, 2021 22:03 Inactive
@rainest
Copy link
Contributor Author

rainest commented Dec 1, 2021

Returning here now that other test stuff has settled. The new TestDeployAndUpgradeAllInOneDBLESS() and associated helpers now run a test which pulls the previous minor version's manifest, deploys it, and then deploys the current manifest if initiated by a new tag with patch 0 (a new minor release).

This works, though I've since realized that it will break on new major releases (it will try to deploy X.-1.0, very unsuccessfully). Is there a simple way we can get the proper prior release for those, or simpler to disable it on minor=0 also and run manually (like below, you can fake it by tagging a new minor locally instead), since majors are infrequent?

13:35:43-0800 esenin $ git describe --tags
v2.0.6-66-gf8c2a37e
13:35:51-0800 esenin $ git tag v2.1.0
13:35:55-0800 esenin $ GOFLAGS="-tags=e2e_tests" go test -v \
        -race \                                                                                                                                                                                                                                        -parallel 8 \
        -timeout 30m \
        ./test/e2e/... -test.v | tee /tmp/testlog.txt

testlog.txt

The no-LB failure is expected until we release a version with #2005 unless you override the image. The Istio failure is a mystery, but looks unrelated:

14:02:28-0800 esenin $ kubectl logs -n kong-system ingress-controller-kong-9bfc5f6c7-s2zf4 -c istio-init 
...
2021-12-01T22:02:09.867104Z	info	Writing following contents to rules file: /tmp/iptables-rules-1638396129866989211.txt3530400109
* nat
-N ISTIO_INBOUND
-N ISTIO_REDIRECT
-N ISTIO_IN_REDIRECT
-N ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp --dport 15008 -j RETURN
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A ISTIO_INBOUND -p tcp --dport 22 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_OUTPUT -o lo -s 127.0.0.6/32 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --uid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --gid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN
-A ISTIO_OUTPUT -j ISTIO_REDIRECT
COMMIT
2021-12-01T22:02:09.867127Z	info	Running command: iptables-restore --noflush /tmp/iptables-rules-1638396129866989211.txt3530400109
2021-12-01T22:02:09.869157Z	error	Command error output: xtables parameter problem: unknown option "--to-ports"
Error occurred at line: 7
Try `iptables-restore -h' or 'iptables-restore --help' for more information.
2021-12-01T22:02:09.869167Z	error	Failed to execute: iptables-restore --noflush /tmp/iptables-rules-1638396129866989211.txt3530400109, exit status 2

@rainest rainest temporarily deployed to Configure ci December 1, 2021 22:16 Inactive
@rainest rainest temporarily deployed to Configure ci December 1, 2021 23:34 Inactive
@rainest rainest temporarily deployed to Configure ci December 1, 2021 23:47 Inactive
Copy link
Contributor

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very glad to see we're adding proper tests for this. I think testing the previous minor release is good, but given the context and assuming that this patch is going to release in v2.1.0 I think we need to add an additional upgrade step so that we're more properly following current users upgrade path, e.g.:

  1. deploy v1.3.x (v1beta1 CRDs)
  2. upgrade to v2.0.x (v1 CRDs, first iter)
  3. upgrade to current (v1 CRDs, second iter)

In this way all three of our CRD iterations will be covered by the tests.

Additionally we need to support an upgrade workflow where someone skips the first iteration of the v1 CRDs, e.g.:

  1. deploy v1.3.x (v1beta1 CRDs)
  2. upgrade to v2.1.x (v1 CRDs, second iter)

Adding these test paths as well will help us cover all our bases.

At a more granular level of the test process itself, It looks like we're currently covering:

  1. deploy v1.3.x with legacy CRDs and deploy and verify KongIngress
  2. upgrade to v2.0.x with modern CRDs and deploy and verify KongIngress

It's occurred to me that we should add the following tests as well:

  1. after upgrade verify that the previously created Ingress and KongIngress are still functioning
  2. update the previously created Ingress and KongIngress resources after upgrade and verify

This may seem superfluous as we would basically expect that if the CRD is working then this would work but when things go wrong with CRDs in production everything grinds to a halt so any additional double checks we can add we should.

Copy link
Contributor

@mflendrich mflendrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As agreed in Zoom, blocking this on the excess TypeMeta fields in configFrom.

config/crd/bases/configuration.konghq.com_kongplugins.yaml Outdated Show resolved Hide resolved
@rainest rainest temporarily deployed to Configure ci December 9, 2021 01:28 Inactive
Copy link
Contributor

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a full review at this point, and I'm largely 👍 for this change. I also wanted to say I appreciate the extra effort to get the kubebuilder validation done right 🥂

My one remaining blocker is due to some uncertainty I feel about the preserveUnknownFields hack. I feel strongly that we should be doing one if not both of the following related to the preserveUnknownFields: false field:

  1. keeping the preserveUnknownFields: false hack in for now so that we can separately reason with that and resolve it in a separate more isolated iteration
  2. develop a comprehensive set of E2E tests which can validate upgrading in a variety of scenarios from older to newer releases

I'm worried that we'll inadvertently put the burden of some unforeseen issue on our end-users when we could spend a bit more time up front to be certain, and if needed provide some automation to resolve the issue (e.g. perhaps our own mutating webhook for v1beta1 to v1 upgrades). I also think there's some general value to be gained by having more tests for upgrade paths. I'm willing to contribute directly to such tests, pairing or just regular.

CHANGELOG.md Show resolved Hide resolved
internal/kongstate/upstream.go Outdated Show resolved Hide resolved
internal/parser/parser_test.go Show resolved Hide resolved
pkg/apis/configuration/v1/kongingress_types.go Outdated Show resolved Hide resolved
@rainest rainest temporarily deployed to Configure ci December 9, 2021 19:20 Inactive
@rainest rainest temporarily deployed to Configure ci December 9, 2021 20:46 Inactive
@rainest
Copy link
Contributor Author

rainest commented Dec 9, 2021

What do expect to learn further about preserveUnknownFields that will let us make that change after? "We don't know, but maybe there's something" is FUD that we can't ever resolve. I think the findings in #1971 (comment) characterize it sufficiently, and gathering more detail without an additional test case means diving into API server code.

What additional tests would we add? Adding, say, an upgrade from the previous major version would fail here with preserveUnknownFields: false removed. That's always going to be the case: if we defer removing it to some future version, we'll just fail on that upgrade instead.

@rainest rainest requested a review from shaneutt December 9, 2021 20:54
@rainest rainest temporarily deployed to Configure ci December 9, 2021 20:59 Inactive
shaneutt
shaneutt previously approved these changes Dec 15, 2021
mflendrich
mflendrich previously approved these changes Dec 15, 2021
@rainest rainest dismissed stale reviews from mflendrich and shaneutt via e2c98a6 December 15, 2021 18:19
@rainest rainest temporarily deployed to Configure ci December 15, 2021 18:19 Inactive
@rainest rainest enabled auto-merge (squash) December 15, 2021 18:32
@rainest rainest temporarily deployed to Configure ci December 15, 2021 18:32 Inactive
@shaneutt shaneutt self-requested a review December 15, 2021 19:02
@rainest rainest merged commit 9c2bc19 into main Dec 15, 2021
@rainest rainest deleted the feat/ki-types branch December 15, 2021 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants