-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle DNS SERVFAIL differently than NXDOMAIN #2776
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few cleanup related PRs but in general makes sense!
...talk-dns-discovery-netty/src/test/java/io/servicetalk/dns/discovery/netty/TestDnsServer.java
Outdated
Show resolved
Hide resolved
...lk-dns-discovery-netty/src/test/java/io/servicetalk/dns/discovery/netty/TestRecordStore.java
Outdated
Show resolved
Hide resolved
...lk-dns-discovery-netty/src/test/java/io/servicetalk/dns/discovery/netty/TestRecordStore.java
Outdated
Show resolved
Hide resolved
251b47a
to
ea5237d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert here but it seems good to me.
...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java
Outdated
Show resolved
Hide resolved
...lk-dns-discovery-netty/src/test/java/io/servicetalk/dns/discovery/netty/TestRecordStore.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I'll give it the 👍 once we can get the required netty changes landed and this PR building.
27b9716
to
b9d58d8
Compare
@@ -1,5 +1,5 @@ | |||
/* | |||
* Copyright © 2018-2020 Apple Inc. and the ServiceTalk project authors | |||
* Copyright © 2018-2020, 2023 Apple Inc. and the ServiceTalk project authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2024?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems good to me. The apparently failing test in #2882 is curious.
ensurePositive(srvHostNameRepeatInitialDelay, "srvHostNameRepeatInitialDelay"); | ||
ensurePositive(srvHostNameRepeatJitter, "srvHostNameRepeatJitter"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it seems reasonable to have a zero jitter. It's common for testing purposes to make things predicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that RepeatStreategies.repeatWithConstantBackoffDeltaJitter
uses ThreadLocalRandom.nextLong(long, long)
that does not allow the same values for upper/lower bounds. So, jitter should be at least 1ns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a bug in repeatWithConstantBackoffDeltaJitter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put up a PR to address it: #2888
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR was merged and we should now support zero length jitters.
...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java
Outdated
Show resolved
Hide resolved
...s-discovery-netty/src/test/java/io/servicetalk/dns/discovery/netty/DefaultDnsClientTest.java
Outdated
Show resolved
Hide resolved
ensurePositive(srvHostNameRepeatInitialDelay, "srvHostNameRepeatInitialDelay"); | ||
ensurePositive(srvHostNameRepeatJitter, "srvHostNameRepeatJitter"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that RepeatStreategies.repeatWithConstantBackoffDeltaJitter
uses ThreadLocalRandom.nextLong(long, long)
that does not allow the same values for upper/lower bounds. So, jitter should be at least 1ns
...tty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsServiceDiscovererBuilder.java
Outdated
Show resolved
Hide resolved
...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java
Outdated
Show resolved
Hide resolved
...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java
Outdated
Show resolved
Hide resolved
...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java
Outdated
Show resolved
Hide resolved
...s-discovery-netty/src/test/java/io/servicetalk/dns/discovery/netty/DefaultDnsClientTest.java
Outdated
Show resolved
Hide resolved
...s-discovery-netty/src/test/java/io/servicetalk/dns/discovery/netty/DefaultDnsClientTest.java
Outdated
Show resolved
Hide resolved
...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java
Outdated
Show resolved
Hide resolved
…ns/discovery/netty/DefaultDnsClient.java Co-authored-by: Bryce Anderson <bryce_anderson@apple.com>
...tty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsServiceDiscovererBuilder.java
Show resolved
Hide resolved
...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java
Outdated
Show resolved
Hide resolved
...s-discovery-netty/src/test/java/io/servicetalk/dns/discovery/netty/DefaultDnsClientTest.java
Outdated
Show resolved
Hide resolved
servicetalk-dns-discovery-netty/gradle/spotbugs/test-exclusions.xml
Outdated
Show resolved
Hide resolved
servicetalk-dns-discovery-netty/gradle/checkstyle/suppressions.xml
Outdated
Show resolved
Hide resolved
@@ -62,6 +62,9 @@ public final class DefaultDnsServiceDiscovererBuilder implements DnsServiceDisco | |||
*/ | |||
@Deprecated // FIXME: 0.43 - consider removing this system property | |||
private static final String SKIP_BINDING_PROPERTY = "io.servicetalk.dns.discovery.netty.skipBinding"; | |||
private static final String NX_DOMAIN_INVALIDATES_PROPERTY = "io.servicetalk.dns.discovery.nxdomain.invalidation"; | |||
@SuppressWarnings("PMD.MutableStaticState") | |||
static boolean NX_DOMAIN_INVALIDATES = parseProperty(NX_DOMAIN_INVALIDATES_PROPERTY, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing that worries me is that if DNS server had a blip or if someone removed entries before adding new entries, DNS client could receive NXDOMAIN
, then invalidate all entries, and then end up in an awkward state without any addresses until another resolution happens. Could you please elaborate a bit more on why it is important to invalidate addresses in this case? Can we just wait for another result set (keep using pre-existing IPs) while we retry UnknownHostException
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NXDOMAIN
is not / should not, be a blip. Its a valid answer that declares that DNS has no knowledge of that domain. The resolver makes every effort to resolve from the nameservers before it gives up. DNS misbehavior (such as SERVFAIL or timeouts) will still be handled more gracefully. Every other case (e.g., removed entries before adding entries) are flows that are beyond the scope of a resolver to heuristically work around, we should propagate the correct state at all times, which makes it easier to debug and reason about.
...tty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsServiceDiscovererBuilder.java
Outdated
Show resolved
Hide resolved
...s-discovery-netty/src/test/java/io/servicetalk/dns/discovery/netty/DefaultDnsClientTest.java
Outdated
Show resolved
Hide resolved
// string matching is done on purpose to avoid the hard Netty dependency | ||
(t.getCause() != null && t.getCause().getClass().getName() | ||
.equals("io.netty.resolver.dns.DnsErrorCauseException")) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting work around.
Motivation:
An
UnknownHostException
can have multiple root causes, each with a different application logic.By far the most prominent is due to an NXDOMAIN from the nameserver, meaning that the domain in the query doesn't not exist. Other reasons could be a SERVFAIL or timeouts during resolutions. Due to insufficient information in the error itself, the dns-client has no obvious way of distinctly handling the various causes.
Modification:
Netty will expose that information as an
init-cause
. ST can leverage the enrichment to handle the different scenarios.Results:
Better behavior.
As part of this effort a behavior change is expected. NXDOMAIN errors will now cause a DNS state invalidation. To retain previous behavior use the property
io.servicetalk.dns.discovery.nxdomain.invalidation
set tofalse
.Depends on netty/netty#13721
Depends on netty/netty#13850