Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel DNS lookups sometimes fail #49727

Closed
dylanlan opened this issue Sep 19, 2023 · 2 comments
Closed

Parallel DNS lookups sometimes fail #49727

dylanlan opened this issue Sep 19, 2023 · 2 comments
Labels
dns Issues and PRs related to the dns subsystem.

Comments

@dylanlan
Copy link

dylanlan commented Sep 19, 2023

Version

v18.14.2

Platform

Darwin N61M66G4LW.lan 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul 5 22:22:05 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T6000 arm64

Subsystem

dns

What steps will reproduce the bug?

Running multiple parallel dns.lookup() on my Apple Silicon Macbook sometimes gets the classic getaddrinfo ENOTFOUND error. It seems very similar to #28292

I first ran into this issue trying to use certain packages in parallel (eg: @aws-sdk/client-ssm). But then I found I could reproduce the error just doing dns lookups directly.

Sample script:

const dns = require('node:dns');
const util = require('util');

const promiseLookup = util.promisify(dns.lookup);

const host = 'github.com';
// const host = 'google.com'; // Certain hosts seems to succeed

async function test() {
    try {
        const requests = [];
        for (let i = 0; i < 1000; i++) {
            const options = {};
            // options.family = 4; // Specifying family: 4 seems to succeed
            // await promiseLookup(host, options); // Awaiting the lookups in sequence seems to succeed
            requests.push(promiseLookup(host, options));
        }
        await Promise.all(requests);
        console.log('success');
    } catch (err) {
        console.log(err);
    }
}

test();

I commented out 3 lines which each individually seem to fix the issue for me, but I'm not sure how to fix it for the package dependencies that I'm using.

How often does it reproduce? Is there a required condition?

I can reproduce it around 50% of the time with the sample script.

But it seems to reliably get fixed for me if any of these are true:

  1. Running the dns.lookup() in sequence
  2. Using { family: 4 } for the options argument
  3. Looking up certain hosts (eg: google.com)
  4. Using other OS (eg: Windows desktop, Ubuntu AWS EC2 instance, Intel processor Macbook)

Here are all the other things I've tried so far, that haven't seemed to fix it:

  • Restarting my Macbook
  • Flushing DNS cache with sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
  • Upgrading to latest node 18.18.0 or 20.7.0
  • Downgrading to node 16.18.1 (previous version I used)
  • Disabling IPv6 (System Settings -> Network -> TCP/IP -> Configure IPv6, set to Link-Local Only)
  • Disconnecting from my VPN
  • Using only Ethernet
  • Using only Wifi
  • Disabling firewall and antivirus
  • Using --dns-result-order=ipv4first or NODE_OPTIONS=--dns-result-order=ipv4first
  • Using require('node:dns/promises')
  • Changing configured DNS server from Mac default to Google's 8.8.8.8 or Cloudflare's 1.1.1.1

What is the expected behavior? Why is that the expected behavior?

The DNS lookups should consistently succeed when ran in parallel

What do you see instead?

Error: getaddrinfo ENOTFOUND github.com

> node sample-script.js
Error: getaddrinfo ENOTFOUND github.com
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26) {
  errno: -3008,
  code: 'ENOTFOUND',
  syscall: 'getaddrinfo',
  hostname: 'github.com'
}

Additional information

I haven't been able to reproduce it with parallel DNS lookups in a similar golang sample program on the same machine. I also haven't been able to reproduce it using dig, nslookup, or ping.

I've been using an Apple Silicon Macbook since 2021, but only started seeing this error during summer 2023. But I wasn't previously testing parallel DNS lookups, so I'm not sure if it's been an issue for me for 2 years or not.

My coworkers on other Apple Silicon Macbooks can also reproduce the issue. Interestingly, some of them on Intel processor Macbooks cannot seem to reproduce it.

I understand that there are potentially many similar issues and workarounds from #5436, and that it might be an OS-specific issue related to libc versions.

I'm confused why I'm not seeing any other recent issues for this other than from 2019 - maybe I'm missing something.

I'm wondering if there's any workarounds from node that I can use in the meantime. Since setting { family: 4 } seems to fix it for me, is there a way to globally set that setting in the node process? Otherwise I'm not sure how to cleanly override that setting when the DNS lookup is happening from nested package dependencies.

@preveen-stack preveen-stack added net Issues and PRs related to the net subsystem. dns Issues and PRs related to the dns subsystem. and removed net Issues and PRs related to the net subsystem. labels Sep 20, 2023
@aduh95
Copy link
Contributor

aduh95 commented Sep 20, 2023

@nodejs/dns

@bnoordhuis
Copy link
Member

Inactionable as a bug report, the linked issue explains why it's not under node's control. Use dns.resolve() if dns.lookup() is giving you trouble.

Setting {family:4} probably means apple's libc uses one file descriptor less per lookup since it's not querying A and AAAA records in parallel.

I'll convert this to a discussion.

@nodejs nodejs locked and limited conversation to collaborators Sep 20, 2023
@bnoordhuis bnoordhuis converted this issue into discussion #49734 Sep 20, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
dns Issues and PRs related to the dns subsystem.
Projects
None yet
Development

No branches or pull requests

4 participants