Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using an IP still invokes DNS lookup due to callback being defined on datagram send #198

Open
peeter-tomberg opened this issue Mar 16, 2021 · 12 comments

Comments

@peeter-tomberg
Copy link

Hello!

We recently integrated datadog ontop of our existing monitoring solution and discovered a lot of DNS lookups for our statsd agent IP. We were specifically passing an IP so we wouldn't have to resolve a host name.

It seems to be the fact that for UDP we are using: https://nodejs.org/docs/latest-v12.x/api/dgram.html#dgram_socket_send_msg_offset_length_port_address_callback

Which is being utilised here: https://github.com/brightcove/hot-shots/blob/master/lib/transport.js#L46-L87 through this API: https://github.com/brightcove/hot-shots/blob/master/lib/statsd.js#L314-L373

Since we are always passing a callback the following kicks in:

An optional callback function may be specified to as a way of reporting DNS errors or for determining when it is safe to reuse the buf object. DNS lookups delay the time to send for at least one tick of the Node.js event loop.

which seems to cause a DNS lookup: nodejs/node#35130

By that post, there should be a quick path for DNS Lookup for IPs but I couldn't find it. We're seeing A LOT of requests for resolving DNS for an IP. Any recommendations on how/if we should cut these down?

@dackland
Copy link

We're seeing the exact same problem. Also wanted to note that we tried using cacheDns which unfortunately did not help.

@bdeitte
Copy link
Collaborator

bdeitte commented Mar 21, 2021

What versions are you both using? Note we have had issues with cacheDns in the past, as noted in https://github.com/brightcove/hot-shots/blob/master/CHANGES.md. So unfortunately some older versions may not work with this setting as expected.

Thank you for all of the looking into this issue @peeter-tomberg in case it's not this. It's not clear to me from nodejs/node#35130 if that would mean this would be a real DNS lookup however? The comment in that issue says " "dns.lookup() has a fast path for IP addresses" which I would assume mean it wouldn't really do a DNS lookup.

@dackland
Copy link

We've been on 8.3 (latest) for two months now.

This post on dd-trace-js describes what we're seeing - a huge volume of DNS lookups on either the dd-agent or DogStatsD service. The only difference being that we use a Docker IP, 172.17.0.1, to connect to both. DataDog seems to think it couldn't be the agent since they don't trace the tracer.

However, the "fast path" for DNS lookups leaves me equally confused as to what's causing this.

@matetukacs
Copy link

We're experiencing the exact same problem

@peeter-tomberg
Copy link
Author

Reproducing this is pretty straightforward on my macbook:

const dns = require('dns');

const originalDnsLookup = dns.lookup;

dns.lookup = (...args) => {
  console.log('dns lookup called with', args);
  const result = originalDnsLookup(...args);
  return result;
};

const hotshots = require('hot-shots');
const statsDClient = new hotshots.StatsD({
  prefix: 'prefix',
  host: 'google.com',
  cacheDns: true,
});
statsDClient.increment('test');
statsDClient.increment('test');
statsDClient.increment('test');
statsDClient.increment('test');
statsDClient.increment('test');
statsDClient.increment('test');
statsDClient.increment('test');

produces the output:

dns lookup called with [ 'google.com', [Function] ]
dns lookup called with [ 'google.com', [Function] ]
dns lookup called with [ 'google.com', [Function] ]
dns lookup called with [ 'google.com', [Function] ]
dns lookup called with [ 'google.com', [Function] ]
dns lookup called with [ 'google.com', [Function] ]
dns lookup called with [ 'google.com', [Function] ]
dns lookup called with [ '0.0.0.0', 4, [Function] ]
dns lookup called with [ '142.250.74.206', 4, [Function: afterDns] ]
dns lookup called with [ '142.250.74.206', 4, [Function: afterDns] ]
dns lookup called with [ '142.250.74.206', 4, [Function: afterDns] ]
dns lookup called with [ '142.250.74.206', 4, [Function: afterDns] ]
dns lookup called with [ '142.250.74.206', 4, [Function: afterDns] ]
dns lookup called with [ '142.250.74.206', 4, [Function: afterDns] ]
dns lookup called with [ '142.250.74.206', 4, [Function: afterDns] ]

@bdeitte
Copy link
Collaborator

bdeitte commented Mar 26, 2021

@peeter-tomberg It's not clear to me that this reproduction really shows that DNS is in use. As I noted above, the issues says that "dns.lookup() has a fast path for IP addresses" which I would assume mean it wouldn't really do a DNS lookup. That would make a lot of sense to me, that the code would be smart enough to return an IP if it thought that's what was seen.

To be clear, I'm not saying there isn't an issue here. I'm just not sure it's clear what exactly is wrong (and how to fix it). Is this a new issue for people?

@bdeitte
Copy link
Collaborator

bdeitte commented Mar 26, 2021

Ok so I looked at this a little bit more tonight, and the example can be changed to actually cache by waiting a bit. Something like this:

const dns = require('dns');

const originalDnsLookup = dns.lookup;

dns.lookup = (...args) => {
  console.log('dns lookup called with', args);
  const result = originalDnsLookup(...args);
  return result;
};

const hotshots = require('hot-shots');
const statsDClient = new hotshots.StatsD({
  prefix: 'prefix',
  host: 'google.com',
  cacheDns: true,
});
statsDClient.increment('test');
setTimeout(() => {
  statsDClient.increment('test');
  statsDClient.increment('test');
  statsDClient.increment('test');
  statsDClient.increment('test');
  statsDClient.increment('test');
  statsDClient.increment('test');
}, 500);

This is because that gives time for the resolved address to get in place for future calls, which you can see in https://github.com/brightcove/hot-shots/blob/master/lib/transport.js#L64

I also now remember that we do have tests in place for this exact case: https://github.com/brightcove/hot-shots/blob/master/test/udpDnsCacheTransport.js#L88

One thing to note is that yes you could see some DNS calls as the caching kicks in, and you will see more DNS calls when cacheDnsTtl expires (which you can set to be higher). I am wondering if these two things, along with the cache setting just being broken in a much of releases, is what everyone is seeing altogether. I don't think I know for sure though, so interested in more data.

I should note though that I have not had a lot of time to dig into things lately- today was an exception. So I very much appreciate all digging in (and PRs, which I do still make sure get looked at promptly). Thanks all!

@peeter-tomberg
Copy link
Author

Referencing DataDog/dd-trace-js#1187

@zacharyliu
Copy link

zacharyliu commented Jun 11, 2022

This behavior still seems to occur in the latest version, but the recent discussion in nodejs/node#39468 might provide an explanation for the behavior. Node’s UDP send() code will always calls the socket's configured lookup function, which is initialized to dns.lookup by default but can be overridden. dns.lookup does return early if the input looks like an IP address, without sending an actual DNS request. But APM implementations like dd-trace instrument all calls of the dns.lookup function itself.

One workaround is to provide a no-op lookup function when initializing the socket (if the hostname is an IP address), so that the dns.lookup function is never called at all. This is technically a very minor perf optimization too, since it avoids the process.nextTick that dns.lookup runs to execute the callback.

@hjr3
Copy link
Contributor

hjr3 commented Oct 14, 2022

Since a399dda landed in v9.2.0 you can do:

const client = new StatsD({
  host,
  port,
  udpSocketOptions: {
    type: 'udp4',
    lookup: (hostname, options, callback) => {
      // if IP address, bypass dns.lookup
      if (isIP(hostname)) {
        callback(null, hostname, 4);
        return;
      }

      // if a real hostname, then resolve using dns.lookup
      dns.lookup(hostname, options, callback);
    },
  },
});

@meetmatt
Copy link

meetmatt commented Apr 5, 2024

Is this issue still actual?

arvid220u added a commit to anysphere/priompt that referenced this issue Oct 12, 2024
added the workaround from
brightcove/hot-shots#198 (comment)

Co-authored-by: Aman Karmani <aman@tmm1.net>
@ebachle
Copy link

ebachle commented Nov 13, 2024

Yeah, this issue is still real @meetmatt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants