Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUM-5447 Fix the racing condition in the RotatingDnsResolver logic #2127

Merged

Conversation

mariusc83
Copy link
Member

What does this PR do?

In this PR we are fixing the race condition problem that was caused by a concurrent access to an unprotected list in the RotatingDnsResolver class. This was raised by the Support team in this ticket: https://datadog.zendesk.com/agent/tickets/1761693

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Make sure you discussed the feature or bugfix with the maintaining team in an Issue
  • Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

@mariusc83 mariusc83 self-assigned this Jul 18, 2024
@mariusc83 mariusc83 force-pushed the mconstantin/rum-5447/fix-dnsresolver-concurrency-issue branch from 818f959 to 289993a Compare July 18, 2024 12:50
@codecov-commenter
Copy link

codecov-commenter commented Jul 18, 2024

Codecov Report

Attention: Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.

Project coverage is 70.02%. Comparing base (fc53808) to head (3967404).
Report is 3 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2127      +/-   ##
===========================================
- Coverage    70.03%   70.02%   -0.01%     
===========================================
  Files          721      721              
  Lines        26773    26785      +12     
  Branches      4499     4501       +2     
===========================================
+ Hits         18749    18754       +5     
- Misses        6771     6779       +8     
+ Partials      1253     1252       -1     
Files Coverage Δ
...d/core/internal/data/upload/RotatingDnsResolver.kt 95.83% <87.50%> (+0.60%) ⬆️

... and 28 files with indirect coverage changes

@mariusc83 mariusc83 marked this pull request as ready for review July 18, 2024 13:40
@mariusc83 mariusc83 requested review from a team as code owners July 18, 2024 13:40
@@ -111,6 +114,30 @@ internal class RotatingDnsResolverTest {
assertThat(result2).containsExactlyElementsOf(fakeInetAddresses)
}

@RepeatedTest(30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to repeat it 30 times? won't it be too long? we are already spinning 100 threads

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but it's randomly re-produceable. I am afraid that if we lower it it might be a false positive

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it runs no more than ~5s for these 30 repetitions, than it is fine

Comment on lines 130 to 136
Thread {
assertThatNoException().isThrownBy {
Thread.sleep(forge.aLong(min = 0, max = 100))
testedDns.lookup(fakeHostname)
}
countDownLatch.countDown()
}.start()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I'm not wrong, this will never throw in the end, the thread spinned will be killed, but error won't be propagated further and won't stop the test.

Moreover, since countDownLatch.countDown() won't happen in case of the error, countDownLatch.await() will be waiting forever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm not sure based on my tests, theoretically assertThatNoException should stop the test with a fail. In that case the countDownLatch will not wait anymore. I will double check this though.

@mariusc83 mariusc83 force-pushed the mconstantin/rum-5447/fix-dnsresolver-concurrency-issue branch from 289993a to 3967404 Compare July 18, 2024 14:58
@mariusc83 mariusc83 requested a review from 0xnm July 18, 2024 14:59
@@ -111,6 +114,30 @@ internal class RotatingDnsResolverTest {
assertThat(result2).containsExactlyElementsOf(fakeInetAddresses)
}

@RepeatedTest(30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it runs no more than ~5s for these 30 repetitions, than it is fine

@datadog-datadog-staging-us1-all

Code Vulnerabilities

Error badge  Notice badge

Code Quality Violations

Error badge  Warning badge  Notice badge

}
}

private fun safeCopy(list: List<InetAddress>): List<InetAddress> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

Is this a function that might be useful in other places in the project ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked myself the same question but for now I could not see any other place. My rule of thumb is that an extension function should have at least 2 different classes where is being used in order to be extracted so that's the reason why I decided to keep it private, non extension here.

@mariusc83 mariusc83 merged commit 2085c28 into develop Jul 19, 2024
19 checks passed
@mariusc83 mariusc83 deleted the mconstantin/rum-5447/fix-dnsresolver-concurrency-issue branch July 19, 2024 08:29
@xgouchet xgouchet added this to the 2.12.x milestone Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants