Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookup function not working? #1044

Closed
kingjerod opened this issue Jul 27, 2020 · 13 comments
Closed

Lookup function not working? #1044

kingjerod opened this issue Jul 27, 2020 · 13 comments
Labels
bug Something isn't working community core

Comments

@kingjerod
Copy link

Describe the bug
I've added a custom lookup function with a cache for the DNS but it doesn't appear to be working. This is my code:

interface IDNSEntry {
  address: string;
  family: number;
  expires: number;
}
const dnsCache: Map<string, IDNSEntry> = new Map();

function lookup(hostname, options, callback) {
  if (dnsCache.has(hostname)) {
    const entry: IDNSEntry = dnsCache.get(hostname);
    if (Date.now() < entry.expires) {
      return callback(null, entry.address, entry.family);
    }
  }
  dns.lookup(hostname, (error, address: string, family: number) => {
    if (error) {
      callback("Failed to lookup " + hostname);
    } else {
      dnsCache.set(hostname, {address, family, expires: (Date.now() + (60 * 1000))});
      return callback(null, address, family);
    }
  });
}
tracer.init({profiling: true, analytics: true, hostname: "dd-agent", lookup: lookup});

Maybe I'm not understanding what the lookup function does. When I go into my Datadog console, I see 100 req/s for "dd-agent", and 3 req/s for other services. I would expect these numbers to be < 1 req/s since they should be cached for 1 minute.

  • Environment
    AWS C5.Large Docker Node image

  • Operation system:
    Docker

  • Node version:
    10.14.1

  • Tracer version:
    0.23.2

  • Agent version:
    Latest

@kingjerod kingjerod added the bug Something isn't working label Jul 27, 2020
@rochdev
Copy link
Member

rochdev commented Aug 12, 2020

Does your code run at all or is it never called? Is this something you could reproduce with a small snippet?

@kingjerod
Copy link
Author

kingjerod commented Aug 12, 2020

It is called, because if there is an error in there it causes issues. I'm not sure what other code I can give you. My goal is to reduce the amount of DNS look ups done, most other services are less than 1 per second, but since adding APM I can see it's 10+ req/s for looking up the dd-agent hostname. Must be every time it wants to send APM stat? Since dd-agent is just another Docker container on the host it doesn't need to be looking it up 100 times per second.

@kingjerod
Copy link
Author

firefox_HQV6gdjbYX

Maybe this image will help to show what I'm seeing. I've erased the other host names below dd-agent. This is the dns service. I'm trying to reduce that 33 million DNS look ups for dd-agent.

@rochdev
Copy link
Member

rochdev commented Aug 12, 2020

Since it's the dns module that calls the lookup function and not the tracer, if the above code is called it means it was properly passed. Is it possible that the code does not result in caching for some reason? If you use the above code directly with the http or dns module do you get the same behavior?

How many instances of the service are there? The default setting for calls to the agent is once every 2 seconds, so to get 100 reqs/s that would mean 200 instances, unless they get very high traffic which causes early flushes.

In any case, one thing that I don't understand is that we now use HTTP keep alive by default, which would mean that once the connection is established, there should be no further DNS lookups even without a cache.

@kingjerod
Copy link
Author

There should be only one instance of the dd-tracer per running server process. The posted code is in a file by itself and imported before anything else in the index.ts file. There are multiple servers running, but at the most it should only be 8 instances total.

I have not tried directly with the http or dns module.

I put the above code into JSFiddle and the caching logic appears to be working properly: https://jsfiddle.net/qrtuv8x7/

In the console:
image

@rochdev
Copy link
Member

rochdev commented Aug 12, 2020

Both keep-alive and the lookup option work for me locally. I'm afraid I will really need a reproduction snippet. Any way you can reproduce the issue in isolation?

@kingjerod
Copy link
Author

Is there a way to reproduce it locally without sending it to Datadog? Rather not have to incur the APM costs to figure this out.

@rochdev
Copy link
Member

rochdev commented Aug 14, 2020

One thing I've just realized is that the DNS lookups are to the agent but these should not show in the UI at all since we don't trace the tracer. Is it possible that something else unrelated to the tracer sends requests to the agent? For example a DogStatsD client? Also, internally the lookup function is cached but that would only work if the tracer is initialized before any other imports in the app. Can you validate that the tracer initialization is the first thing that happen in the entry point of the app?

Reproducing without sending to Datadog is definitely possible but it would be a bit involved since you'd either have to parse the log output or implement a mock agent to intercept and introspect the requests.

@kingjerod
Copy link
Author

Hmm it could definitely be the DogStatsD client! I'll have to set that to use DNS cache. Next time I have to enable APM I will report back to see if that fixes things.

@rochdev
Copy link
Member

rochdev commented Oct 7, 2020

@kingjerod Were you able to validate if the issue was caused by the DogStatsD client?

@rochdev
Copy link
Member

rochdev commented Oct 21, 2020

Closing for now as from the above discussion it sounds like it's caused by the DogStatsD client and we were not able to reproduce with the tracer. Please feel free to re-open if that was not actually the cause.

@rochdev rochdev closed this as completed Oct 21, 2020
@dackland
Copy link

@kingjerod Were you able to fix this issue? We're seeing the exact same problem, and using the DogStatsD client's cacheDns option has no effect.

@Raynos
Copy link

Raynos commented Apr 17, 2023

Seeing the same issue still happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community core
Projects
None yet
Development

No branches or pull requests

4 participants