Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC Resource Tracing #634

Merged
merged 8 commits into from
Mar 1, 2022

Conversation

scothis
Copy link
Contributor

@scothis scothis commented Feb 16, 2022

Readable

Changes proposed by this PR

closes #

Release Note

PR Checklist

Note: Please do not remove items. Mark items as done [x] or use strikethrough if you believe they are not relevant

  • Linked to a relevant issue. Eg: Fixes #123 or Updates #123
  • Removed non-atomic or wip commits
  • Filled in the Release Note section above
  • Modified the docs to match changes

Signed-off-by: Scott Andrews <andrewssc@vmware.com>
@netlify
Copy link

netlify bot commented Feb 16, 2022

✔️ Deploy Preview for elated-stonebraker-105904 canceled.

🔨 Explore the source changes: 955261d

🔍 Inspect the deploy log: https://app.netlify.com/sites/elated-stonebraker-105904/deploys/621e562e702a29000816df4f

@martyspiewak
Copy link
Contributor

I think this is great!

In terms of observedGeneration, it feels a bit odd for it to refer to the generation of the stamped object, but not be nested under the stampedRef. I also wonder what the intended use of this field is? You mentioned that lastTransitionTime could be useful for seeing that the supply chain is progressing but I'm not sure I see what observedGeneration is useful for.

- `.status.resources[*].stampedRef` object reference to the Kubernetes resource created by Cartographer for this resource.
- `.status.resources[*].inputs[*]` inputs model the relationships between resources within the SupplyChain graph. The value of an input can be read from the referenced resource's outputs.
- `.status.resources[*].inputs[*].name` the name of the resource backing this input.
- `.status.resources[*].outputs[*].name` the name of the output. Output names are fixed and defined by the template type.
Copy link
Contributor

@cirocosta cirocosta Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made me wonder if it'd be more clear if we instead named the field resource: (for inputs)

and maybe type: (for output), so that we're very explicit about it 🤔 (nvm, for outputs)

Copy link
Contributor

@cirocosta cirocosta Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or perhaps even resourceName to really leave no room for other interpretations?)

Copy link
Contributor

@cirocosta cirocosta Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or ... what if we even removed the plural here (outputs) and went solely with output? a resource can only export either one (cluster(source|image|config)template) or no output (clustertemplate)), and the type can already be inferred by looking up the templateref (.status.resources[*].templateRef.kind)

nvm, each entry is a field of the output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or change inputs to inputResources?

I waffled a bit while drafting the RFCs whether .inputs[*].name should be .inputs[*].resource since the value is from the resource field within the ResourceReference struct. I ended up going with name here because it's the dominate part of the relationship and I was viewing this as a local object reference like resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a resource can only export either one (cluster(source|image|config)template) or no output (clustertemplate))

We certainly could make the output a mux. I didn't because I think it makes it harder for a generic tool to consume as it would need to use a switch to find the active bit of the config. This is a loosely held opinion.

Signed-off-by: Scott Andrews <andrewssc@vmware.com>
@martyspiewak
Copy link
Contributor

@emmjohnson and I took a stab at spiking this out to see what it might look like. It's on this branch, if anyone is interested in taking a look.

@scothis
Copy link
Contributor Author

scothis commented Feb 17, 2022

In terms of observedGeneration, it feels a bit odd for it to refer to the generation of the stamped object, but not be nested under the stampedRef. I also wonder what the intended use of this field is? You mentioned that lastTransitionTime could be useful for seeing that the supply chain is progressing but I'm not sure I see what observedGeneration is useful for.

All fair comments. I was trying to preserve a bit of the behavior from RFC 18 that included the ResourceVersion. I don't have a specific use case in mind. If we don't collectively have a use, I'll remove it.

@waciumawanjohi
Copy link
Contributor

I can imagine uses of the lastTransitionTime, think it's a great addition.

I don't think the observedGeneration is actionable. Even more, I think it would be often misinterpreted. I would expect users to infer far too much, e.g. that when the outputs report observedGeneration X then they are the result of the inputs of generation X (which is not necessarily true).

In practice, the observedGeneration on its own just lets me know that the object's controller is 'on'. Even then, without exposing the .metadata.generation the observedGeneration is simply a monotonic counter. Vote to remove the field.

- [First draft of RFC 0014](https://github.com/vmware-tanzu/cartographer/pull/274)
- [Introduce RFC 18 Workload Report Artifact Provenance](https://github.com/vmware-tanzu/cartographer/pull/519)

# Unresolved Questions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To point out a possible risk:

We have another RFC on tracing that would duplicate much of this information if adopted. The use cases for that RFC remain and I haven't heard progress on alternate work that would need to be done at the level of all of the choreographed components of a supply chain if that RFC were to be rejected. E.g. at the moment I would predict that we're going to move forward with some version of that RFC.

An important note about tracing is that it is artifact centric rather than resource centric. E.g. wherein this proposal reports 1 artifact per resource, there is good reason for the tracing RFC to report more than 1 artifact for each resource.

If we adopt this RFC and later adopt the tracing RFC we'll need to decide:

  • if we want to have two longish fields on the workload that have much of the same information, or
  • if we want to make a breaking change and alter what's proposed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a link to the RFC and ideally the parts of the RFC which you feel are relevant

@waciumawanjohi
Copy link
Contributor

Random request:
Can the title of this RFC change? We've been using the term tracing to discuss following artifact A as it propagates through a supply chain and creates downstream changes. This RFC does not attempt to achieve that.

Possible names:

  • Resource reporting
  • Exposing stamped resources to workload owners
  • ...

Signed-off-by: Scott Andrews <andrewssc@vmware.com>
@scothis
Copy link
Contributor Author

scothis commented Feb 22, 2022

Can the title of this RFC change?

sure

We've been using the term tracing to discuss following artifact A as it propagates through a supply chain and creates downstream changes. This RFC does not attempt to achieve that.

I consider both of these to be tracing, but there's a difference in what is being traced. In this case what we're tracing is the supply chain defined steps for a workload. I agree that it's not artifact level tracing.

Signed-off-by: Scott Andrews <andrewssc@vmware.com>

Co-authored-by: Marty Spiewak <spiewakmarty@gmail.com>
- name: source-provider
outputs:
- name: image
value: registry.example/supply-chain/my-workload@sha256:68f8e8fc6e8ede7a411db9182cd695eac7b3e7e19e4ff9dcb9ba21205c135697
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On RFC 18 @evankanderson raised a valid concern about the potential size of outputs causing issues.

#519 (comment)

There is also a 1MB limit on the size of a single resource, and the values might be repeated twice due to the addition of managedFields. I'd be careful about including large other documents directly for that reason. It seems like it would be better to include a resourceVersion or generation field referencing the resource's version, rather than the entire contents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could switch including the value to use a hash of the value. Clients would be required to fetch the stamped resource in order to resolve the actual value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to a digest and path instead of a value for outputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely understand the argument about the config being too big, but I think capturing the value here is important. A lot of our modelling assumes that the workload is the only object that a developer needs to care about, now we're saying that they need to go and make these other associations to other objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we don't have value anymore then we'd have to remove lastTransitionTime as we won't be storing the old value so we won't know if it's changed. It'd also be misleading as to what lastTransitionTime- it could be taken to mean the last time the output path changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cirocosta how do we know what value needs to be masked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update the output digest/value/preview to be whatever people collectively want.

There are two different dimensions of risk/customer impact to consider with the decision.

  1. [confidentiality] "leaking" outputs that an org views as sensitive
  2. [availability] a supply chain stops working in production, with an unclear path to recovery
field confidentiality risk availability risk
value medium high
preview medium low
digest low low

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I get some thumbs up/thumbs down on moving forward with the following:

Add a preview field with which holds the first 200 bytes of each value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scothis After today's chat, I'm pretty comfortable with adding the preview field with <1000 bytes (200 seems fine to me). If you aren't opposed, would you mind adding it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

For `.status.resources[*].outputs[*]`, replace `value` with `digest` and
`path` fields. This addresses two concerns:
1. the values may be too large, and exceed the size limit for a k8s
   resource
2. the values may be sensitive

Now a consumer will need to resolve the raw value from the stamped
resource using the output path specified.

Signed-off-by: Scott Andrews <andrewssc@vmware.com>
- `.status.resources[*].inputs[*]` inputs model the relationships between resources within the SupplyChain graph. The value of an input can be read from the stamped resource based on the outputs for the referenced resource.
- `.status.resources[*].inputs[*].name` the name of the resource backing this input.
- `.status.resources[*].outputs[*].name` the name of the output. Output names are fixed and defined by the template type.
- `.status.resources[*].outputs[*].digest` sha256sum of the raw output value.
Copy link
Contributor

@cirocosta cirocosta Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to be more specific here given how "sensitive" hashing is to any bit change, e.g., a javascript based client might end up interpreting the raw value differently from the controller due to a different json decoder? wondering that because we're effectively hashing the result of the jsonpath query (rather than straight byte stream)

Copy link
Contributor

@cirocosta cirocosta Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you mentioned that we should try as much as possible to not expose potentially sensitive information in workload.status, but storage-wise, made me wonder if it'd be too much of a crazy/bad idea considering a custom apiserver implementation to work around the etcd storage limitation

IIRC, because you can back the endpoints you register against with ... anything, we could back it up with """something else""", although it's clearly a large complexity add

somewhat related: tektoncd/community#606

Copy link
Contributor Author

@scothis scothis Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to be more specific here given how "sensitive" hashing is to any bit change, e.g., a javascript based client might end up interpreting the raw value differently from the controller due to a different json decoder?

That's a good point, the digest will only be stable against a string or byte array value, structs are ambiguous. The intent isn't to be cryptographically secure, but to provide a check of the value so a client can be reasonably sure that it has the "correct" value and that the value hasn't changed between when the c

We can drop the digest field for now if you're not confident in it, and add it back once we have the semantics nailed.

made me wonder if it'd be too much of a crazy/bad idea considering a custom apiserver implementation to work around the etcd storage limitation

yes, crazy and bad :)

But on a serious note, there are other concerns that would be nice to have that will never end up on the Workload status. For example, a history of artifacts and how they interact in the supply chain. This kind of data should be captured into a different system that can support temporal (and other types of) queries. I wouldn't use that front an aggregated api, but to be used as a separate store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how sensible this is, but it is possible to serialize with sorted keys in the go yaml package (unsure about the k8s one). That would at least produce repeatable digests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also say that the digest is primarily for Cartographer's internal use (to generate the lastTransitionTime corectly).

@scothis scothis changed the title RFC Supply Chain Tracing RFC Supply Chain Resource Tracing Feb 24, 2022
Signed-off-by: Scott Andrews <andrewssc@vmware.com>
@scothis scothis changed the title RFC Supply Chain Resource Tracing RFC Resource Tracing Feb 24, 2022
@scothis
Copy link
Contributor Author

scothis commented Feb 24, 2022

renamed to "Resource Tracing"

Copy link
Member

@squeedee squeedee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe folks still expect something like the digest as a part of this but everything else is fine as is.

Signed-off-by: Scott Andrews <andrewssc@vmware.com>
@scothis scothis marked this pull request as ready for review March 1, 2022 15:44
@martyspiewak martyspiewak mentioned this pull request Mar 1, 2022
4 tasks
@martyspiewak martyspiewak enabled auto-merge (squash) March 1, 2022 17:23
@martyspiewak martyspiewak merged commit 4cf4554 into vmware-tanzu:main Mar 1, 2022
@scothis scothis deleted the rfc-supply-chain-tracing branch March 21, 2022 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc Requests For Comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants