KEP-3926: handling undecryptable resources #3927

stlaz · 2023-03-27T10:45:09Z

One-line PR description: Improve the identification of resources that won't decrypt. Make it possible to remove undecryptable resources by using the API.

Issue link: Handling undecryptable resources #3926

Other comments: Enable deleting API objects even when storage-level decryption is not working properly kubernetes#86489

stlaz · 2023-03-27T10:49:20Z

/cc @ibihim @deads2k @enj

stlaz · 2023-03-27T11:07:51Z

kubernetes/kubernetes#116943 shows a proof of concept for the list options proposed in this KEP.

keps/sig-auth/3926-handling-undecryptable-resources/README.md

liggitt · 2023-03-28T16:52:01Z

keps/sig-auth/3926-handling-undecryptable-resources/README.md

+  //          doing.
+  // WARNING: Vendors will most likely consider using this option to be breaking the
+  //          support of their product.
+  UnconditionalDeleteWithClusterBreakingPotential bool


it's unclear what this API lets you skip... I would not expect to be allowed to skip admission or finalizers if the existing persisted object can be decoded successfully

+1 - I thikn it should only allow deleting objects (with all the consequences) that we cannot decode. Nothing else should be allowed.

Just to get our lingo synchronized - I would consider decoding and transforming two separate cases (you'll see why in code ref below).

What about objects we cannot retrieve from storage for other reasons?

The generic storage Get implementation fails in one additional case, that is if key cannot be constructed - https://github.com/kubernetes/kubernetes/blob/3cf9f66e90d560ac080687610933c712bcf37b39/staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go#L756-L769

The etcd3 store implementation fails in 6 other cases:
https://github.com/kubernetes/kubernetes/blob/3cf9f66e90d560ac080687610933c712bcf37b39/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go#L132-L166

Most of the cases indeed seem like errors we may not want to ignore. Should we focus solely on the transformation error, then?

Should we focus solely on the transformation error, then?

I don't think so... transformation and decoding failures have almost identical implications, symptoms, and considerations for deleting anyway. Solving both problems together and consistently makes sense to me

stlaz · 2023-04-20T08:28:48Z

@liggitt @wojtek-t @dgrisonnet The addressable comments were addressed, please take a look when you get a chance

liggitt · 2023-05-01T17:07:09Z

keps/sig-auth/3926-handling-undecryptable-resources/README.md

+Currently, removing a resource that causes such failures is not possible.
+A cluster administrator must access etcd directly and remove the malformed data manually.
+
+This KEP proposes a way to identify resources that fail to decrypt, and introduces


the mechanics of this seem to belong more to api-machinery (who owns the list and delete functionality that would be modified as part of this) ... there are multiple reasons a resource could fail to decode from storage, decryption is only one of them (e.g. kubernetes/kubernetes#69579)

keps/sig-auth/3926-handling-undecryptable-resources/README.md

liggitt · 2023-05-01T17:09:32Z

keps/sig-auth/3926-handling-undecryptable-resources/README.md

+  //          doing.
+  // WARNING: Vendors will most likely consider using this option to be breaking the
+  //          support of their product.
+  UnconditionalDeleteWithClusterBreakingPotential bool


Should we focus solely on the transformation error, then?

I don't think so... transformation and decoding failures have almost identical implications, symptoms, and considerations for deleting anyway. Solving both problems together and consistently makes sense to me

soltysh · 2023-10-05T09:13:44Z

keps/sig-auth/3926-handling-undecryptable-resources/README.md

+- [ ] (R) Production readiness review approved
+- [ ] "Implementation History" section is up-to-date for milestone
+- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
+- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes


Nit: make sure to check (R) items above.

soltysh · 2023-10-05T09:17:28Z

keps/sig-auth/3926-handling-undecryptable-resources/README.md

+      - Impact of its degraded performance or high-error rates on the feature:
+-->
+
+### Scalability


I mean more about the scalability impact overall, not only focusing on the feature. Imagine a cluster with hundreds of nodes, where the this feature is enabled, and then look at the questions below and see if you can answer them in a reasonable fashion. It doesn't have to be perfect, it's not a required step for alpha, and based on the initial testing at alpha you'll be able to expand on it when promoting to beta 😄

soltysh · 2023-10-05T09:19:10Z

keps/sig-auth/3926-handling-undecryptable-resources/README.md

+      - Impact of its degraded performance or high-error rates on the feature:
+-->
+
+### Scalability


For example, David's question is a good way to answer the question: Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? and limiting the number of values you handle at once is a potential safe guard to ensure that CPU and/or RAM isn't increased.

soltysh · 2023-10-05T09:20:54Z

keps/sig-auth/3926-handling-undecryptable-resources/README.md

+      - Impact of its degraded performance or high-error rates on the feature:
+-->
+
+### Scalability


A different question from that set is about SLI/SLO, is it possible that the extra operations required to force delete all the resources will affect the guaranteed time to delete a resource, will it be negligible or the expected increase is significant, etc. I hope these examples will help 😉

stlaz · 2023-10-05T10:38:13Z

I squashed the previous changes and addressed all the most recent comments in the "address comments" commit.

soltysh

#prr-shadow
lgtm - thx :)

deads2k · 2023-10-05T13:18:48Z

/label tide-merge-method-squash
/approve

k8s-ci-robot · 2023-10-05T13:18:50Z

@deads2k: The label(s) /label tide-merge-method-squash cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor, lead-opted-in, tracked/no, tracked/out-of-tree, tracked/yes. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label tide-merge-method-squash
/approve

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-10-05T13:18:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, stlaz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [deads2k]
~~keps/sig-auth/OWNERS~~ [deads2k]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

stlaz · 2023-10-05T13:25:55Z

(I squashed the changes to a single commit in the latest force push)

deads2k · 2023-10-05T13:41:10Z

/label tide/merge-method-squash

keps/prod-readiness/sig-auth/3926.yaml

sttts · 2023-10-05T15:39:19Z

keps/sig-auth/3926-handling-undecryptable-resources/README.md

+
+The unconditional deletion admission:
+1. checks if a "delete" request contains the `IgnoreStoreReadErrorWithClusterBreakingPotential` option
+2. if it does, it checks the RBAC of the request's user for the `delete-ignore-read-errors` verb of the given resource


can we align these two words, the option and the verb?

@tkashem this is still open in the implementation.

sttts · 2023-10-05T15:43:00Z

keps/sig-auth/3926-handling-undecryptable-resources/README.md

+  //          doing.
+  // WARNING: Vendors will most likely consider using this option to be breaking the
+  //          support of their product.
+  IgnoreStoreReadErrorWithClusterBreakingPotential bool


could this be withoutReadingFromStorage, i.e. a mode of deletion that has nothing to do with errors but is just highly unconditional deletion? (disclaimer: have only read the kep briefly, maybe I missed it)

Looking at today's DeleteOptions, this could even be called unconditional or force.

xref old thread around this topic #3927 (comment)

worth to summarize as a non-goal:

give clients control over skipping other steps of a delete request flow than decoding errors

worth to summarize as a non-goal:

give clients control over skipping other steps of a delete request flow than decoding errors

Agree.

deads2k · 2023-10-05T18:29:28Z

/lgtm

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 27, 2023

k8s-ci-robot requested review from mikedanese and ritazh March 27, 2023 10:45

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 27, 2023

stlaz mentioned this pull request Mar 27, 2023

Handling undecryptable resources #3926

Open

4 tasks

stlaz force-pushed the failed_transform branch from 512970f to f78f0e1 Compare March 27, 2023 10:48

k8s-ci-robot requested review from deads2k, enj and ibihim March 27, 2023 10:49

stlaz force-pushed the failed_transform branch from f78f0e1 to 3ff9a45 Compare March 27, 2023 10:58

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 27, 2023

dgrisonnet reviewed Mar 28, 2023

View reviewed changes

keps/sig-auth/3926-handling-undecryptable-resources/README.md Outdated Show resolved Hide resolved

liggitt reviewed Mar 28, 2023

View reviewed changes

keps/sig-auth/3926-handling-undecryptable-resources/README.md Outdated Show resolved Hide resolved

liggitt reviewed Mar 28, 2023

View reviewed changes

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 3, 2023

stlaz force-pushed the failed_transform branch from a80bf28 to 3b808a1 Compare April 3, 2023 11:11

liggitt reviewed May 1, 2023

View reviewed changes

liggitt self-assigned this Jul 5, 2023

enj added this to the v1.29 milestone Sep 7, 2023

soltysh reviewed Oct 5, 2023

View reviewed changes

stlaz force-pushed the failed_transform branch from c294537 to 0c0a5f6 Compare October 5, 2023 10:37

soltysh reviewed Oct 5, 2023

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2023

KEP-3926: handling undecryptable resources

c32d3dc

stlaz force-pushed the failed_transform branch from 0c0a5f6 to c32d3dc Compare October 5, 2023 13:24

k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 5, 2023

sttts reviewed Oct 5, 2023

View reviewed changes

keps/prod-readiness/sig-auth/3926.yaml Show resolved Hide resolved

sttts reviewed Oct 5, 2023

View reviewed changes

k8s-ci-robot assigned deads2k Oct 5, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 5, 2023

k8s-ci-robot merged commit 7edd84b into kubernetes:master Oct 5, 2023

stlaz mentioned this pull request Oct 6, 2023

KEP-3926: add a non-goal of skipping different errors #4282

Merged

liggitt removed their assignment Oct 13, 2023

soltysh mentioned this pull request Apr 3, 2024

Add soltysh to prod-readiness-approvers #4566

Merged

tkashem mentioned this pull request Oct 18, 2024

KEP-3926: refactor: extend newTransformTest to enable RBAC kubernetes/kubernetes#128191

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-3926: handling undecryptable resources #3927

KEP-3926: handling undecryptable resources #3927

stlaz commented Mar 27, 2023

stlaz commented Mar 27, 2023

stlaz commented Mar 27, 2023

liggitt Mar 28, 2023

wojtek-t Mar 29, 2023

stlaz Mar 29, 2023

liggitt May 1, 2023

stlaz commented Apr 20, 2023

liggitt May 1, 2023

liggitt May 1, 2023

soltysh Oct 5, 2023

soltysh Oct 5, 2023

soltysh Oct 5, 2023

soltysh Oct 5, 2023

stlaz commented Oct 5, 2023

soltysh left a comment

deads2k commented Oct 5, 2023

k8s-ci-robot commented Oct 5, 2023

k8s-ci-robot commented Oct 5, 2023

stlaz commented Oct 5, 2023

deads2k commented Oct 5, 2023

sttts Oct 5, 2023

sttts Oct 23, 2024

sttts Oct 5, 2023

sttts Oct 5, 2023

sttts Oct 5, 2023

sttts Oct 5, 2023

deads2k Oct 5, 2023

deads2k commented Oct 5, 2023

KEP-3926: handling undecryptable resources #3927

KEP-3926: handling undecryptable resources #3927

Conversation

stlaz commented Mar 27, 2023

stlaz commented Mar 27, 2023

stlaz commented Mar 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stlaz commented Apr 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stlaz commented Oct 5, 2023

soltysh left a comment

Choose a reason for hiding this comment

deads2k commented Oct 5, 2023

k8s-ci-robot commented Oct 5, 2023

k8s-ci-robot commented Oct 5, 2023

stlaz commented Oct 5, 2023

deads2k commented Oct 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented Oct 5, 2023