-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEBUG-3533 increase UDS transport timeout to 30 seconds #4411
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the issue is in receive timeout (maybe the payload and latency). Do this setting increases SO_RCVTIMEO
or all type of timeouts?
Question: this value is already configurable from user, right? Does that really need to change the default? |
Datadog ReportBranch report: ✅ 0 Failed, 20407 Passed, 1374 Skipped, 3m 19.42s Total Time |
BenchmarksBenchmark execution time: 2025-02-19 22:10:24 Comparing candidate commit a10f126 in PR branch Found 0 performance improvements and 1 performance regressions! Performance is the same for 30 metrics, 2 unstable metrics. scenario:profiler - gvl benchmark samples
|
@Strech It is the read timeout on the socket. Set in Ruby so probably affects Ruby IO layer + everything under it. I also found out while answering this question that the timeout is defined in two places for some reason, I changed the other one also. @tonyhsu The timeout is user-configurable but the default value for UDS does not actually work for remote config (as discovered on shopist). |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4411 +/- ##
==========================================
- Coverage 97.71% 97.70% -0.02%
==========================================
Files 1366 1366
Lines 83402 83402
Branches 4231 4231
==========================================
- Hits 81497 81488 -9
- Misses 1905 1914 +9 ☔ View full report in Codecov by Sentry. |
* master: [🤖] Lock Dependency: https://github.com/DataDog/dd-trace-rb/actions/runs/13416256659 Skip mongo-min [🤖] Lock Dependency: https://github.com/DataDog/dd-trace-rb/actions/runs/13416008881 Extract mongo group Fix mongo test when >= 2.15.0
I confirmed that the agent may issue a synchronous request to the backend (over the internet) when remote configuration is requested by the tracer. Therefore the unix domain socket timeout should be the same as the HTTP timeout, which currently is 30 seconds for Ruby. |
What does this PR do?
Increases timeout for unix domain socket transport from 1 to 30 seconds.
Motivation:
Shopist uses UDS and dd-trace-rb is able to submit traces to the agent via UDS with the current 1 second timeout but is not able to receive remote configuration with this timeout:
Increasing UDS timeout fixes the error and permits remote configuration to be retrieved.
Change log entry
Yes: increase timeout when unix domain socket is used for datadog agent
Additional Notes:
I investigated the timeout values and discovered the following: initially, there was no timeout set at all. In 2021 a timeout was added to both HTTP and UDS transports and it was set to 1 second for both. In 2024 the HTTP timeout was increased to 30 seconds with UDS timeout remaining unchanged at 1 second.
I confirmed that the agent may issue a synchronous request to the backend (over the internet) when remote configuration is requested by the tracer. Therefore the unix domain socket timeout should be the same as the HTTP timeout, which currently is 30 seconds for Ruby.
I think good timeout values for UDS are 5 or 10 seconds. We expect to not need to deal with a network however the agent does need time to process the request. I think if the HTTP timeout is left at 30 seconds, a 10 second timeout is reasonable for UDS; alternatively, a sensible combination in my mind would be 5 second timeout for UDS and 15 or 20 second timeout for HTTP.Note that Heroku for example used to have (still has?) a fixed 30 second timeout for processing the entire request, and exceeding this time generally is a Bad Thing therefore in production having a 30 second timeout for any single component operation in a web app is often not ideal. Hence my thought to reduce the HTTP timeout from 30 to 15 or 20 seconds.How to test the change?
Tested manually against shopist