-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUMM-1890 Fix tests flakiness #711
RUMM-1890 Fix tests flakiness #711
Conversation
notably: `testWhenDataIsBeingUploaded_itPrintsUploadProgressInformationAndSendsErrorsThroughInternalMonitoring` The flakiness was caused by `userLogger` reference leaked in some other tests which use `DateCorrector` (e.g. all tests in `DatadogTests`). NTP sync completion block was executed no matter of `self` existence, making it send logs to current (global) `userLogger` arbitrarily. This was causing some other tests asserting `userLogger` output to receive false data, coming not from their execution.
by running each measure to fixed number of samples, instead of using time-based condition.
precisely in `testWhenCurrentValueIsObtainedFromNetworkConnectionInfoProvider_thenCrashContextProviderNotifiesNewContext`.
by ensuring autorelease VC deallocation with `autoreleasepool {}`
…ntsAreSent` by increasing number of samples to 400 (with 200 it was failing 4/500 repetitions, with 300 1/500, with 400 it's 100% success).
completion: { offset in | ||
completion: { [weak self] offset in | ||
guard let _ = self else { | ||
return | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is to address flaky execution of
dd-sdk-ios/Tests/DatadogTests/Datadog/Core/Upload/DataUploadWorkerTests.swift
Lines 230 to 235 in 181359b
func testWhenDataIsBeingUploaded_itPrintsUploadProgressInformationAndSendsErrorsThroughInternalMonitoring() { | |
let previousUserLogger = userLogger | |
defer { userLogger = previousUserLogger } | |
let mockUserLoggerOutput = LogOutputMock() | |
userLogger = .mockWith(logOutput: mockUserLoggerOutput) |
Problem was that DateCorrector
is expected to perform real NTP sync in many high-level tests (e.g. most tests in DatadogTests
). This closure might leak from test execution and send NTP sync log to global userLogger
, causing bad value being recorded in other tests that depend on global userLogger
mock, e.g.:
The fix is to not leak this closure - and return it early when SDK was deallocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work?
completion: { [weak userLogger] offset in
Not a change request, I'm not sure about lifecycle of the logger vs. DateCorrector 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DateCorrector
is instantiated only once for a given SDK instance (from Datadog.initialize()
).
I think more correct solution is to avoid doing any work in this DateCorrector
closure if self == nil
. From self == nil
we know that DateCorrector
was deallocated → so the parent Datadog
object was deallocated → so no-one is interested in result of this work.
Capturing weak reference to userLogger
might work too but it will be ignoring only a portion of the completion work related to userLogger
. As no-one will use the result of completion
work, I find this odd.
How does it sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the root cause is that Kronos.Clock
is static as well as userLogger
. I think it's best to fix it in ServerDateProvider
and call the completion only if self
exist, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best to fix it in
ServerDateProvider
and call the completion only ifself
exist, WDYT?
Indeed! It depends on self
already and clearly it runs its completion inconsitently when self
is nil
. Will do 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✅
XCTAssertEqual(initialContext.lastNetworkConnectionInfo, initialNetworkConnectionInfo) | ||
XCTAssertEqual(updatedContext?.lastNetworkConnectionInfo, currentNetworkConnectionInfo) | ||
let updatedNetworkConnectionInfo = try XCTUnwrap(updatedContext?.lastNetworkConnectionInfo) | ||
XCTAssertEqual(initialContext.lastNetworkConnectionInfo, initialNetworkConnectionInfo, "It must store initial network info") | ||
XCTAssertEqual(updatedNetworkConnectionInfo, currentNetworkConnectionInfo, "It must store updated network info") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It failed 2 times over 1 month, but no clue on what's happening wrong in this test. I'm just adding more verbosity to these assertions, so we can better understand it.
let simulatedSessionsCount = 200 | ||
let simulatedSessionsCount = 400 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using tests repetition in local:
- with 200 samples - 4 failures in 500 runs
- with 300 samples - 1 failures in 500 runs
- with 400 samples - 0 failures in 500 runs
With 400 samples it runs in ~180ms
.
let identity = try XCTUnwrap(vc?.asRUMViewIdentity()) | ||
try autoreleasepool { | ||
var vc: UIViewController? = UIViewController() | ||
identity = try XCTUnwrap(vc?.asRUMViewIdentity()) | ||
XCTAssertNotNil(identity.identifiable, "Reference should be available while `vc` is alive.") | ||
vc = nil | ||
} | ||
|
||
XCTAssertNotNil(identity.identifiable, "Reference should be available while `vc` is alive.") | ||
vc = nil | ||
XCTAssertNil(identity.identifiable, "Reference should not be available after `vc` was deallocated.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 failures in last 1 month. It was failing on:
XCTAssertNotNil(identity.identifiable, "Reference should be available while `vc` is alive.")
I assume that autoreleasepool {}
should help as VC is autorelease Objective-C object and now we clean it up in different scope than the assertion.
func testHighAndLowRefreshRates() { | ||
func testWhenMainThreadOverheadGoesUp_itMeasuresLowerRefreshRate() throws { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 failures in last 1 month. This is quite old issue, I changed the approach for this test:
- Instead of observing main thread for certain amount of time...
- ... now it observes it until
VitalRefreshRateReader
records certain number of samples.
It passes 500x repetition in local. From my observation, time-based constraint was very flaky (sometimes recording just 1 sample, sometimes many). Current approach of recording 30 samples seems stable and still executes under 2s
in local (like before).
if #available(iOS 13.0, *) { | ||
encodedValue1 = try prettyEncoder.encode(value1) | ||
encodedValue2 = try prettyEncoder.encode(value2) | ||
} else { | ||
encodedValue1 = try prettyEncoder.encode(EncodingContainer(value1)) | ||
encodedValue2 = try prettyEncoder.encode(EncodingContainer(value2)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the biggest problem in last 1 month - 14 failures in:
dd-sdk-ios/Tests/DatadogTests/Datadog/CrashReporting/CrashContext/CrashContextTests.swift
Lines 49 to 65 in e3468ed
func testGivenContextWithLastRUMSessionStateSet_whenItGetsEncoded_thenTheValueIsPreservedAfterDecoding() throws { | |
let randomRUMSessionState: RUMSessionState? = Bool.random() ? .mockRandom() : nil | |
// Given | |
var context: CrashContext = .mockRandom() | |
context.lastRUMSessionState = randomRUMSessionState | |
// When | |
let serializedContext = try encoder.encode(context) | |
// Then | |
let deserializedContext = try decoder.decode(CrashContext.self, from: serializedContext) | |
try AssertEncodedRepresentationsEqual( | |
value1: deserializedContext.lastRUMSessionState, | |
value2: randomRUMSessionState | |
) | |
} |
When fuzzy input is resolved to nil
, serialization fails below iOS 13. To fix it I'm using the helper we added exactly for this case:
dd-sdk-ios/Tests/DatadogTests/Helpers/Encoding.swift
Lines 9 to 13 in e3468ed
/// Prior to `iOS13.0`, the `JSONEncoder` supports only object or array as the root type. | |
/// Hence we can't test encoding `Encodable` values directly and we need to wrap it inside this `EncodingContainer` container. | |
/// | |
/// Reference: https://bugs.swift.org/browse/SR-6163 | |
struct EncodingContainer<Value: Encodable>: Encodable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
if #available(iOS 13.0, *) { | ||
encodedValue1 = try prettyEncoder.encode(value1) | ||
encodedValue2 = try prettyEncoder.encode(value2) | ||
} else { | ||
encodedValue1 = try prettyEncoder.encode(EncodingContainer(value1)) | ||
encodedValue2 = try prettyEncoder.encode(EncodingContainer(value2)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
What and why?
🔬 This PR addresses tests flakiness from last month - collected from 189K test runs on the main branch:
How?
Each case explained in PR comments. There's no general conclusion.
Review checklist