Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument Tekton resources for tracing #2814

Closed
afrittoli opened this issue Jun 12, 2020 · 24 comments
Closed

Instrument Tekton resources for tracing #2814

afrittoli opened this issue Jun 12, 2020 · 24 comments
Assignees
Labels
area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@afrittoli
Copy link
Member

Expected Behavior

I am able to analyse where the time is spent during an execution of a task or pipeline.
I can break down the execution time into time spent reconciling logic, fetching resources, pulling images, running containers, and more.

Actual Behavior

Right now with opencensus metrics we have data about the overall duration but no breakdown view.

Additional Info

We could instrument Tekton according to the OpenTracing spec

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 12, 2020
@afrittoli afrittoli added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jun 12, 2020
@jlpettersson
Copy link
Member

Yes, this is a great feature to have!

This has been discussed before, in terms of Observability, also with design docs were it as suggested that this problem was "tracy" for Tekton, but only the metric part was implemented. Related issues: #540 and #164

Related article: Reducing Build Time with Observability in the Software Supply Chain

Related presentation: Observability in the SSC: Seeing Into Your Build System
(Tekton is named 38 min into that video)

OpenTracing has now become Open Telemetry

@NavidZ
Copy link
Member

NavidZ commented Jun 22, 2020

@afrittoli, I can take a look at this and start with migrating the current metrics to Open Telemetry API and then add more submetrics/tracing to the code.

@NavidZ
Copy link
Member

NavidZ commented Jun 22, 2020

/assign

@afrittoli
Copy link
Member Author

Thanks @NavidZ, looking forward to your contributions!

@hrishin Implemented the existing metrics, so cc him for interest.
@vdemeester do you think this would belong to a TEP?

@vdemeester
Copy link
Member

@vdemeester do you think this would belong to a TEP?

Yes 🙃

@afrittoli
Copy link
Member Author

@NavidZ TEP is a new process that we recently introduced - have a look at the community repo for guidance and feel free to ping me and @vdemeester if you have queries / need assistance with the process.

@NavidZ
Copy link
Member

NavidZ commented Jun 23, 2020

@afrittoli sounds good. I will send a TEP with the suggested sub-metrics.

Just to give an update on what I was up to so far. After looking deeper into OpenTelemetry Go client SDK (as it is still in beta) I thought it might be better to delay until knative also migrated as we used their libraries for metrics reporting and they have #3126 to track it. I'll create another issue of our own to follow the migration maybe as a separate task.

@NavidZ
Copy link
Member

NavidZ commented Jul 13, 2020

@vdemeester @afrittoli I sent a PR with the initial TEP. From the comments in side TEP template I got the impression that it is better to merge things slowly (like first summary and motivation) to get agreements on separates parts separately. But let me know if you prefer me to send also the design/sub-metrics and whatnot to that first PR as well.

@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 15, 2020
@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@NavidZ
Copy link
Member

NavidZ commented Aug 15, 2020

/reopen

@tekton-robot
Copy link
Collaborator

@NavidZ: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vdemeester
Copy link
Member

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

@tekton-robot tekton-robot reopened this Aug 17, 2020
@tekton-robot
Copy link
Collaborator

@vdemeester: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 17, 2020
@bobcatfish bobcatfish removed the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Sep 1, 2020
@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 30, 2020
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 30, 2020
@vdemeester
Copy link
Member

Putting this into "frozen" box as this is something we need to do at some point

/lifecycle frozen
/remove-lifecycle rotten

@tekton-robot tekton-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jan 5, 2021
@bobcatfish bobcatfish added the area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) label Feb 24, 2021
@afrittoli
Copy link
Member Author

@mattmoor I see that Knative has support for tracing (through OpenCensus for now), but I doesn't look as if any of the tracing capabilities are exported in the knative/pkg generated sharedmain. Is there any plan about that?

@mattmoor
Copy link
Member

@afrittoli that's mostly for dataplane components, where sharedmain is mostly for controlplane components.

I don't believe Knative does anything to try and build up a trace/spans with these kinds of constituent parts, and some of them may be hard to stitch together (and thread the trace-id through) since this what you describe is not an HTTP request flow.

You'd probably want the entrypoint to expose this stuff, and you'd have to plumb through a trace-id for it to use. Knative likely has stuff for passing a tracing config through to a dataplane component, since those are often provisioned by the controlplane elements.

@afrittoli
Copy link
Member Author

Thanks @mattmoor - that makes sense. Indeed it's not an HTTP request flow, but I think it would be valuable to have tracing information from the Tekton control-plane components. One of the major difficulties I see is creating one (and only one) span for resource, e.g. one span for a PipelineRun, and pass that context across reconcile cycles.
I guess if we can rely on having the same controller process always reconciling the same key that would help. If I'm not mistaken sharding of resources across horizontally scaled controllers is based on the resource key, so that helps.

You'd probably want the entrypoint to expose this stuff, and you'd have to plumb through a trace-id for it to use.

Yeah, it would be nice to have the entrypoint exposing the trace-id in case steps would like to pick it up and continue a trace within the step.

@mattmoor
Copy link
Member

One of the major difficulties I see is creating one (and only one) span for resource

One option here is to put it into the resource status. You don't even really need a new field if you use Knative's duckv1.Status since we support status.annotations, which could hold a trace-id annotation used to stitch things together.

@kmjayadeep
Copy link
Contributor

I have proposed a TEP for this : tektoncd/community#839

@jerop
Copy link
Member

jerop commented Feb 17, 2023

This feature request was addressed in TEP-0124: Distributed tracing for Tasks and Pipelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Done
Status: Done
Development

No branches or pull requests

9 participants