Make UdpSender lazy to be able to recover from early DNS issues #726

pschichtel · 2020-07-02T21:21:47Z

this should close #369

Which problem is this PR solving?

Issue getTracer() fails when the UDP socket hostname isn't resolvable #369

Short description of the changes

made the UdpSender lazily create the ThriftUdpTransport instance to prevent early DNS errors from causing initialization errors. This does require locking, which has been implemented using the double-checked locking pattern to reduce the lock overhead for (usually) the majority of send calls.

pschichtel · 2020-07-02T22:05:27Z

Not entirely sure what would be the right action for that failing test. It seems to have relied on the UdpSender failing during initialization. If we want to keep it that way, we'd have to eagerly trigger the instantiation of the thrift tranport. I have various ways in my hand on how to do that, but I'd prefer not to unless necessary.

pschichtel · 2020-07-28T14:36:27Z

@yurishkuro @pavolloffay it seems like the project is a little inactive right now, or is this simply the wrong place for such a change request?

objectiser · 2020-07-28T16:28:05Z

@pschichtel Sorry for the delay in looking at this.

Not entirely sure what would be the right action for that failing test. It seems to have relied on the UdpSender failing during initialization. If we want to keep it that way, we'd have to eagerly trigger the instantiation of the thrift tranport. I have various ways in my hand on how to do that, but I'd prefer not to unless necessary.

I think it would be fine to change the test to check return value is UdpSender.

yurishkuro · 2020-07-28T18:33:17Z

I wonder if this is going to fix the more fundamental issue where the agent address may simply change, e.g. due to agent restart.

There was a recent change in the Go client that implemented periodic reconnects: jaegertracing/jaeger-client-go#520

codecov · 2020-07-28T18:52:51Z

Codecov Report

Merging #726 into master will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master     #726   +/-   ##
=========================================
  Coverage     88.76%   88.76%           
  Complexity      596      596           
=========================================
  Files            73       73           
  Lines          2242     2242           
  Branches        289      289           
=========================================
  Hits           1990     1990           
  Misses          159      159           
  Partials         93       93

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b0ed16...d42e192. Read the comment docs.

pschichtel · 2020-07-28T18:54:19Z

Updated the commit to fix the test.

@yurishkuro In its current state, this PR will not handle an address change. Once the client has successfully been created, that instance will stay the same. However adding on some kind of periodic rechecking and/or error handling in UdpSender#send should not be hard as a follow-up.

yurishkuro · 2020-07-28T18:58:16Z

@jpkrohling do you still think we should introduce a configuration option?

pschichtel · 2020-07-28T19:08:58Z

The change in that unit test makes me wonder... Would this change break some kind of fallback logic somewhere? It seems that was being tested there.

jpkrohling

What happens when the agent isn't available and the client keeps trying to send data? Will there be multiple stacktraces in the logs, one per batch? If so, then we definitely need to make this opt-in.

jaeger-thrift/src/main/java/io/jaegertracing/thrift/internal/senders/UdpSender.java

jaeger-thrift/src/test/java/io/jaegertracing/thrift/internal/senders/UdpLazinessTest.java

pschichtel · 2020-07-29T14:41:21Z

What happens when the agent isn't available and the client keeps trying to send data? Will there be multiple stacktraces in the logs, one per batch? If so, then we definitely need to make this opt-in.

What about only printing the first stacktrace or every n-th or once every n minutes ?

jpkrohling · 2020-07-29T14:53:29Z

First stacktrace + positive confirmation once it succeeds sounds good to me.

this should close #369 Signed-off-by: Phillip Schichtel <phillip@schich.tel>

pschichtel · 2020-07-29T20:23:40Z

I incorporated the suggestions, however I'm not entirely clear on where to handle the logging. I don't think the UdpSender should just swallow the exceptions, I'd say this should be handled at a higher level. I looked at the HttpSender as well and both of them fail in the same way now with this PR: Exception on every call. So any solution should be at a common code path. Either way I'm not sure this is something that should be done in this PR.

yurishkuro · 2020-07-30T01:31:07Z

I expect the Reporter to catch the exceptions, since it processes items off a queue from a background thread, so there's nowhere to re-throw them.

jpkrohling · 2020-07-30T07:09:15Z

I looked at the HttpSender as well and both of them fail in the same way now with this PR: Exception on every call. So any solution should be at a common code path. Either way I'm not sure this is something that should be done in this PR.

Agree that this shouldn't be done in this PR. Would you please open an issue with this idea?

pschichtel · 2020-07-30T19:38:11Z

So I think we are done here, right?

pschichtel · 2020-07-31T12:20:40Z

Follow up about the logging can be found at #729

pschichtel mentioned this pull request Jul 2, 2020

TRACER-#369: Added a logger property logr which will handle the excep… #668

Closed

yurishkuro approved these changes Jul 28, 2020

View reviewed changes

jpkrohling reviewed Jul 29, 2020

View reviewed changes

jaeger-thrift/src/main/java/io/jaegertracing/thrift/internal/senders/UdpSender.java Outdated Show resolved Hide resolved

jaeger-thrift/src/test/java/io/jaegertracing/thrift/internal/senders/UdpLazinessTest.java Outdated Show resolved Hide resolved

Make UdpSender lazy to be able to recover from early DNS issues

d42e192

this should close #369 Signed-off-by: Phillip Schichtel <phillip@schich.tel>

pschichtel mentioned this pull request Jul 30, 2020

Log errors that occur during ThriftSender's flush #728

Closed

jpkrohling approved these changes Jul 31, 2020

View reviewed changes

jpkrohling merged commit 27059eb into jaegertracing:master Jul 31, 2020

yurishkuro mentioned this pull request Nov 9, 2020

How to watch jaeger-agent disconnection jaegertracing/jaeger#2619

Closed

lopezzlaura mentioned this pull request Oct 14, 2021

Services don't resume sending spans after an agent outage #808

Closed

naveedyahyazadeh mentioned this pull request Dec 20, 2021

Large percentage of spans captured by jaeger_tracer_reporter_spans_total metric are resulting in error #821

Closed

yurishkuro mentioned this pull request Feb 1, 2022

io.jaegertracing.jaeger-client 1.7.0 ICMP port unreachable if agent daemonset restarts #827

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make UdpSender lazy to be able to recover from early DNS issues #726

Make UdpSender lazy to be able to recover from early DNS issues #726

pschichtel commented Jul 2, 2020

pschichtel commented Jul 2, 2020

pschichtel commented Jul 28, 2020

objectiser commented Jul 28, 2020

yurishkuro commented Jul 28, 2020

codecov bot commented Jul 28, 2020 •

edited

Loading

pschichtel commented Jul 28, 2020

yurishkuro commented Jul 28, 2020

pschichtel commented Jul 28, 2020

jpkrohling left a comment

pschichtel commented Jul 29, 2020 •

edited

Loading

jpkrohling commented Jul 29, 2020

pschichtel commented Jul 29, 2020

yurishkuro commented Jul 30, 2020

jpkrohling commented Jul 30, 2020

pschichtel commented Jul 30, 2020

pschichtel commented Jul 31, 2020

Make UdpSender lazy to be able to recover from early DNS issues #726

Make UdpSender lazy to be able to recover from early DNS issues #726

Conversation

pschichtel commented Jul 2, 2020

Which problem is this PR solving?

Short description of the changes

pschichtel commented Jul 2, 2020

pschichtel commented Jul 28, 2020

objectiser commented Jul 28, 2020

yurishkuro commented Jul 28, 2020

codecov bot commented Jul 28, 2020 • edited Loading

Codecov Report

pschichtel commented Jul 28, 2020

yurishkuro commented Jul 28, 2020

pschichtel commented Jul 28, 2020

jpkrohling left a comment

Choose a reason for hiding this comment

pschichtel commented Jul 29, 2020 • edited Loading

jpkrohling commented Jul 29, 2020

pschichtel commented Jul 29, 2020

yurishkuro commented Jul 30, 2020

jpkrohling commented Jul 30, 2020

pschichtel commented Jul 30, 2020

pschichtel commented Jul 31, 2020

codecov bot commented Jul 28, 2020 •

edited

Loading

pschichtel commented Jul 29, 2020 •

edited

Loading