-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Tracing Great Again #17693
Comments
cc @qw4990 I think it’s a considerable reimplementation for the entries in slow log. |
It would be great if we can show the tracing result in DataDog. DataDog is popular and I heard from several users that they expect to see metrics or tracing in DataDog. |
@breeswish @IANTHEREAL I moved it to the P1 priority, please confirm if that's correct. |
@zz-jason yes, do it |
Feature Request
Is your feature request related to a problem? Please describe:
TiDB already supports
TRACE
statement, but it is rarely useful for several reasons:Tracing is in fact a very useful piece of observability (log, metrics, trace). We did pretty well for log and metrics, but pretty bad for trace.
For now it is a bit hard to discover what causes a SQL to run slow in some scenarios. Existing facilities are:
The new tracing implementation is supposed to provide diagnose capability for such scenarios.
Describe the feature you'd like:
Improve the tracing feature.
Stage 1.
The result of tracing can be further displayed in TiDB Dashboard in a nice way, for example, like what DataDog did:
Stage 2.
Users usually talk with executors instead of spans. The tracing should be able to be integrated with execution plans, allowing user to easily map each spans with executors, or providing a way to know why a specific executor is slow.
However overall tracing view is still necessary, since some spans do not belong to any executors.
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
The tracing facility implementation in TiKV is already finished: https://github.com/pingcap-incubator/minitrace . It is super-efficient, can trace a span within 20ns. Even for shortest requests like point get, tracing 100 spans only introduces 6% performance lost (notice that in real life we are likely to trace < 10 spans for point get, so that the performance lost is negligible).
Tracing spans can be a data source for correspinding fields of slow log and metrics, to avoid counting duration repeatedly (and result in different durationos).
The text was updated successfully, but these errors were encountered: