-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC Resource Tracing #634
RFC Resource Tracing #634
Conversation
Signed-off-by: Scott Andrews <andrewssc@vmware.com>
✔️ Deploy Preview for elated-stonebraker-105904 canceled. 🔨 Explore the source changes: 955261d 🔍 Inspect the deploy log: https://app.netlify.com/sites/elated-stonebraker-105904/deploys/621e562e702a29000816df4f |
I think this is great! In terms of |
rfc/rfc-0000-supply-chain-tracing.md
Outdated
- `.status.resources[*].stampedRef` object reference to the Kubernetes resource created by Cartographer for this resource. | ||
- `.status.resources[*].inputs[*]` inputs model the relationships between resources within the SupplyChain graph. The value of an input can be read from the referenced resource's outputs. | ||
- `.status.resources[*].inputs[*].name` the name of the resource backing this input. | ||
- `.status.resources[*].outputs[*].name` the name of the output. Output names are fixed and defined by the template type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made me wonder if it'd be more clear if we instead named the field resource:
(for inputs)
and maybe (nvm, for outputs)type:
(for output), so that we're very explicit about it 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(or perhaps even resourceName
to really leave no room for other interpretations?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or ... what if we even removed the plural here (outputs) and went solely with output
? a resource can only export either one (cluster(source|image|config)template
) or no output (clustertemplate
)), and the type can already be inferred by looking up the templateref (.status.resources[*].templateRef.kind
)
nvm, each entry is a field of the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or change inputs
to inputResources
?
I waffled a bit while drafting the RFCs whether .inputs[*].name
should be .inputs[*].resource
since the value is from the resource
field within the ResourceReference struct. I ended up going with name
here because it's the dominate part of the relationship and I was viewing this as a local object reference like resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a resource can only export either one (cluster(source|image|config)template) or no output (clustertemplate))
We certainly could make the output a mux. I didn't because I think it makes it harder for a generic tool to consume as it would need to use a switch to find the active bit of the config. This is a loosely held opinion.
Signed-off-by: Scott Andrews <andrewssc@vmware.com>
@emmjohnson and I took a stab at spiking this out to see what it might look like. It's on this branch, if anyone is interested in taking a look. |
All fair comments. I was trying to preserve a bit of the behavior from RFC 18 that included the ResourceVersion. I don't have a specific use case in mind. If we don't collectively have a use, I'll remove it. |
I can imagine uses of the lastTransitionTime, think it's a great addition. I don't think the observedGeneration is actionable. Even more, I think it would be often misinterpreted. I would expect users to infer far too much, e.g. that when the outputs report observedGeneration X then they are the result of the inputs of generation X (which is not necessarily true). In practice, the observedGeneration on its own just lets me know that the object's controller is 'on'. Even then, without exposing the |
rfc/rfc-0000-supply-chain-tracing.md
Outdated
- [First draft of RFC 0014](https://github.com/vmware-tanzu/cartographer/pull/274) | ||
- [Introduce RFC 18 Workload Report Artifact Provenance](https://github.com/vmware-tanzu/cartographer/pull/519) | ||
|
||
# Unresolved Questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To point out a possible risk:
We have another RFC on tracing that would duplicate much of this information if adopted. The use cases for that RFC remain and I haven't heard progress on alternate work that would need to be done at the level of all of the choreographed components of a supply chain if that RFC were to be rejected. E.g. at the moment I would predict that we're going to move forward with some version of that RFC.
An important note about tracing is that it is artifact centric rather than resource centric. E.g. wherein this proposal reports 1 artifact per resource, there is good reason for the tracing RFC to report more than 1 artifact for each resource.
If we adopt this RFC and later adopt the tracing RFC we'll need to decide:
- if we want to have two longish fields on the workload that have much of the same information, or
- if we want to make a breaking change and alter what's proposed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a link to the RFC and ideally the parts of the RFC which you feel are relevant
Random request: Possible names:
|
Signed-off-by: Scott Andrews <andrewssc@vmware.com>
sure
I consider both of these to be tracing, but there's a difference in what is being traced. In this case what we're tracing is the supply chain defined steps for a workload. I agree that it's not artifact level tracing. |
Signed-off-by: Scott Andrews <andrewssc@vmware.com> Co-authored-by: Marty Spiewak <spiewakmarty@gmail.com>
rfc/rfc-0000-supply-chain-tracing.md
Outdated
- name: source-provider | ||
outputs: | ||
- name: image | ||
value: registry.example/supply-chain/my-workload@sha256:68f8e8fc6e8ede7a411db9182cd695eac7b3e7e19e4ff9dcb9ba21205c135697 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On RFC 18 @evankanderson raised a valid concern about the potential size of outputs causing issues.
There is also a 1MB limit on the size of a single resource, and the values might be repeated twice due to the addition of managedFields. I'd be careful about including large other documents directly for that reason. It seems like it would be better to include a resourceVersion or generation field referencing the resource's version, rather than the entire contents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could switch including the value to use a hash of the value. Clients would be required to fetch the stamped resource in order to resolve the actual value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to a digest
and path
instead of a value
for outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely understand the argument about the config being too big, but I think capturing the value here is important. A lot of our modelling assumes that the workload is the only object that a developer needs to care about, now we're saying that they need to go and make these other associations to other objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we don't have value
anymore then we'd have to remove lastTransitionTime
as we won't be storing the old value so we won't know if it's changed. It'd also be misleading as to what lastTransitionTime
- it could be taken to mean the last time the output path changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cirocosta how do we know what value needs to be masked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can update the output digest
/value
/preview
to be whatever people collectively want.
There are two different dimensions of risk/customer impact to consider with the decision.
- [confidentiality] "leaking" outputs that an org views as sensitive
- [availability] a supply chain stops working in production, with an unclear path to recovery
field | confidentiality risk | availability risk |
---|---|---|
value |
medium | high |
preview |
medium | low |
digest |
low | low |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I get some thumbs up/thumbs down on moving forward with the following:
Add a preview
field with which holds the first 200 bytes of each value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scothis After today's chat, I'm pretty comfortable with adding the preview
field with <1000 bytes (200 seems fine to me). If you aren't opposed, would you mind adding it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
For `.status.resources[*].outputs[*]`, replace `value` with `digest` and `path` fields. This addresses two concerns: 1. the values may be too large, and exceed the size limit for a k8s resource 2. the values may be sensitive Now a consumer will need to resolve the raw value from the stamped resource using the output path specified. Signed-off-by: Scott Andrews <andrewssc@vmware.com>
rfc/rfc-0000-supply-chain-tracing.md
Outdated
- `.status.resources[*].inputs[*]` inputs model the relationships between resources within the SupplyChain graph. The value of an input can be read from the stamped resource based on the outputs for the referenced resource. | ||
- `.status.resources[*].inputs[*].name` the name of the resource backing this input. | ||
- `.status.resources[*].outputs[*].name` the name of the output. Output names are fixed and defined by the template type. | ||
- `.status.resources[*].outputs[*].digest` sha256sum of the raw output value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need to be more specific here given how "sensitive" hashing is to any bit change, e.g., a javascript based client might end up interpreting the raw value differently from the controller due to a different json decoder? wondering that because we're effectively hashing the result of the jsonpath query (rather than straight byte stream)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you mentioned that we should try as much as possible to not expose potentially sensitive information in workload.status
, but storage-wise, made me wonder if it'd be too much of a crazy/bad idea considering a custom apiserver implementation to work around the etcd storage limitation
IIRC, because you can back the endpoints you register against with ... anything, we could back it up with """something else""", although it's clearly a large complexity add
somewhat related: tektoncd/community#606
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need to be more specific here given how "sensitive" hashing is to any bit change, e.g., a javascript based client might end up interpreting the raw value differently from the controller due to a different json decoder?
That's a good point, the digest will only be stable against a string or byte array value, structs are ambiguous. The intent isn't to be cryptographically secure, but to provide a check of the value so a client can be reasonably sure that it has the "correct" value and that the value hasn't changed between when the c
We can drop the digest
field for now if you're not confident in it, and add it back once we have the semantics nailed.
made me wonder if it'd be too much of a crazy/bad idea considering a custom apiserver implementation to work around the etcd storage limitation
yes, crazy and bad :)
But on a serious note, there are other concerns that would be nice to have that will never end up on the Workload status. For example, a history of artifacts and how they interact in the supply chain. This kind of data should be captured into a different system that can support temporal (and other types of) queries. I wouldn't use that front an aggregated api, but to be used as a separate store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how sensible this is, but it is possible to serialize with sorted keys in the go yaml package (unsure about the k8s one). That would at least produce repeatable digests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also say that the digest is primarily for Cartographer's internal use (to generate the lastTransitionTime corectly).
Signed-off-by: Scott Andrews <andrewssc@vmware.com>
renamed to "Resource Tracing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe folks still expect something like the digest as a part of this but everything else is fine as is.
Signed-off-by: Scott Andrews <andrewssc@vmware.com>
6ae72f0
to
53dd2a1
Compare
Readable
Changes proposed by this PR
closes #
Release Note
PR Checklist
Note: Please do not remove items. Mark items as done
[x]
or usestrikethroughif you believe they are not relevantFixes #123
orUpdates #123
wip
commits