-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instrument Tekton resources for tracing #2814
Comments
Yes, this is a great feature to have! This has been discussed before, in terms of Observability, also with design docs were it as suggested that this problem was "tracy" for Tekton, but only the metric part was implemented. Related issues: #540 and #164 Related article: Reducing Build Time with Observability in the Software Supply Chain Related presentation: Observability in the SSC: Seeing Into Your Build System OpenTracing has now become Open Telemetry |
@afrittoli, I can take a look at this and start with migrating the current metrics to Open Telemetry API and then add more submetrics/tracing to the code. |
/assign |
Thanks @NavidZ, looking forward to your contributions! @hrishin Implemented the existing metrics, so cc him for interest. |
Yes 🙃 |
@NavidZ TEP is a new process that we recently introduced - have a look at the community repo for guidance and feel free to ping me and @vdemeester if you have queries / need assistance with the process. |
@afrittoli sounds good. I will send a TEP with the suggested sub-metrics. Just to give an update on what I was up to so far. After looking deeper into OpenTelemetry Go client SDK (as it is still in beta) I thought it might be better to delay until knative also migrated as we used their libraries for metrics reporting and they have #3126 to track it. I'll create another issue of our own to follow the migration maybe as a separate task. |
@vdemeester @afrittoli I sent a PR with the initial TEP. From the comments in side TEP template I got the impression that it is better to merge things slowly (like first summary and motivation) to get agreements on separates parts separately. But let me know if you prefer me to send also the design/sub-metrics and whatnot to that first PR as well. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
Rotten issues close after 30d of inactivity. /close Send feedback to tektoncd/plumbing. |
@tekton-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@NavidZ: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
@vdemeester: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
Putting this into "frozen" box as this is something we need to do at some point /lifecycle frozen |
@mattmoor I see that Knative has support for tracing (through OpenCensus for now), but I doesn't look as if any of the tracing capabilities are exported in the |
@afrittoli that's mostly for dataplane components, where sharedmain is mostly for controlplane components. I don't believe Knative does anything to try and build up a trace/spans with these kinds of constituent parts, and some of them may be hard to stitch together (and thread the trace-id through) since this what you describe is not an HTTP request flow. You'd probably want the entrypoint to expose this stuff, and you'd have to plumb through a trace-id for it to use. Knative likely has stuff for passing a tracing config through to a dataplane component, since those are often provisioned by the controlplane elements. |
Thanks @mattmoor - that makes sense. Indeed it's not an HTTP request flow, but I think it would be valuable to have tracing information from the Tekton control-plane components. One of the major difficulties I see is creating one (and only one) span for resource, e.g. one span for a
Yeah, it would be nice to have the entrypoint exposing the trace-id in case steps would like to pick it up and continue a trace within the step. |
One option here is to put it into the resource status. You don't even really need a new field if you use Knative's |
I have proposed a TEP for this : tektoncd/community#839 |
This feature request was addressed in TEP-0124: Distributed tracing for Tasks and Pipelines |
Expected Behavior
I am able to analyse where the time is spent during an execution of a task or pipeline.
I can break down the execution time into time spent reconciling logic, fetching resources, pulling images, running containers, and more.
Actual Behavior
Right now with opencensus metrics we have data about the overall duration but no breakdown view.
Additional Info
We could instrument Tekton according to the OpenTracing spec
/kind feature
The text was updated successfully, but these errors were encountered: