-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUMM-1744 Collect Kronos telemetry in Internal Monitoring #709
RUMM-1744 Collect Kronos telemetry in Internal Monitoring #709
Conversation
@@ -7,22 +7,45 @@ | |||
@testable import Datadog | |||
|
|||
class KronosE2ETests: E2ETests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is to also collect this telemetry from our E2E tests and run full KronosMonitor
logic in CI environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm but i left a comment regarding usage of DispatchGroup
I know this is not the scope of this PR, but could there also be a way to disable clock sync in the API/build settings? This might help mitigate this issue in the mean time for our users. The alternative for me would be to fork the repo and do this on my own, but I'd much prefer to stay on the official Cocoapod dependency. |
Hello @sergiocampama 🙂👋. Unfortunately, while such API / build setting might work for you it won't work for all our users. If an app uses Datadog Distributed Tracing or tracks 1st party RUM resources, disabling NTP will lead to serious and immediate misalignments in client to server spans causing serious issues in data consistency. We totally understand that some apps might be not collecting distributed telemetry but the set of APIs and configuration options we provide must be equally functional for everyone. This is why we're not considering it as viable solution to mitigate the problem. Instead, the last sequence of actions we took in
I wish I could have better answer for you and I clearly understand why this is important for your users. Unfortunately, a tailor-made fork is currently the only option to disable NTP in |
What and why?
📦🔬 This PR adds
KronosMonitor
for collecting internal telemetry on Kronos execution (NTP sync). This is to gather more data and have more observability on #647.Note: this additional telemetry will only be collected in apps that enable our Internal Monitoring feature, which means our own dogfood 🐶 projects. It won't be collected from customer apps.
How?
KronosMonitor
traces 3 phases of Kronos execution:Additionally, it uses
NWConnection
to checkUDP
connection reachability for each resolved IP on port123
. It then aggregates all results (3 Kronos phases + connection checks to all IPs) and reports single telemetry event (Log
), e.g.:Review checklist