Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reverse lookup broken on Mac OS runners #8649

Closed
3 of 10 tasks
oliver-sanders opened this issue Oct 24, 2023 · 20 comments
Closed
3 of 10 tasks

reverse lookup broken on Mac OS runners #8649

oliver-sanders opened this issue Oct 24, 2023 · 20 comments
Assignees
Labels
bug report investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: macOS

Comments

@oliver-sanders
Copy link

oliver-sanders commented Oct 24, 2023

Description

Reverse lookup of the host name is not working on the Mac OS runner.

ubuntu-latest:

$ nslookup fv-az955-853:
...
Name:	fv-az955-853.mlkcatuscfmejm4ctfapoghrmg.cx.internal.cloudapp.net

macos-latest:

$ nslookup $(hostname -f)
...
** server can't find Mac-1698147376508.local: NXDOMAIN

For an example, see the nslookup and python.socket steps of this workflow run:

https://github.com/oliver-sanders/actions-dns-test/actions/runs/6626432376/job/17999359243

First spotted a couple of weeks ago.

For context, see these two similar instances where reverse DNS stopped working on the Linux images:

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 20.04
  • Ubuntu 22.04
  • macOS 11
  • macOS 12
  • macOS 13
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

Image: macos-12
Version: 20230921.1

Image: macos-13
Version: 20231204.4

Is it regression?

Yes, seen with runners with macos version 12.7.1 or above.

Expected behavior

Reverse lookup should return the hostname.

Actual behavior

Reverse lookup results in error.

Repro steps

To reproduce, see this workflow:

https://github.com/oliver-sanders/actions-dns-test/actions/runs/6626432376/job/17999359243

@shamil-mubarakshin shamil-mubarakshin self-assigned this Oct 24, 2023
@shamil-mubarakshin shamil-mubarakshin added OS: macOS investigate Collect additional information, like space on disk, other tool incompatibilities etc. and removed needs triage labels Oct 24, 2023
@shamil-mubarakshin
Copy link
Contributor

Hi @oliver-sanders,
Thanks for reporting. We are investigating the issue

@shamil-mubarakshin
Copy link
Contributor

@oliver-sanders, after poking around, nslookup doesn't seem to be the right tool for DNS lookups on macOS, which is also mentioned on tool's man page. It also leaves me wondering whether this behavior always been the case.
Using dscacheutil gives more stable results, honouring local files (similar hack was with Ubuntu in the past, but the issue was in IP inconsistency). E.g. below should return host IPs:

echo -e "$(ipconfig getifaddr en0) $(hostname -f) $(hostname -s)" | sudo tee -a /etc/hosts 
dscacheutil -q host -a name $(hostname -f)

We will continue investigating and see if something else could be done

@oliver-sanders
Copy link
Author

oliver-sanders commented Oct 25, 2023

@shamil-mubarakshin, thanks for looking in.

Didn't know there were issues with nslookup on Mac OS, interesting.

I also used Python's socket bindings in my tests which show similar failures for reverse lookups which had worked previously:

socket.gethostname()                              : Mac-1698147376508.local
socket.getfqdn()                                  : Mac-1698147376508.local
socket.getfqdn(socket.gethostname())              : Mac-1698147376508.local
socket.getfqdn(socket.getfqdn())                  : Mac-1698147376508.local
socket.gethostbyname_ex(socket.gethostname())[0]  : [Errno 8] nodename nor servname provided, or not known
socket.gethostbyname_ex(socket.getfqdn())[0]      : [Errno 8] nodename nor servname provided, or not known

I managed to dig out an example of a workflow where the Mac OS job failed the first two times and passed on the third: https://github.com/cylc/cylc-flow/actions/runs/6634707075

With this message in the failed runs:

socket.gaierror: [Errno 8] nodename nor servname provided, or not known: 'Mac-1698197657674.local'
# attempt 1 - fail
  Image: macos-12
  Version: 20230921.1

# attempt 2 - fail
   Image: macos-12
  Version: 20231017.6

# attempt 3 - pass
   Image: macos-12
  Version: 20230921.4

rail added a commit to rail/cockroach that referenced this issue Oct 31, 2023
Previously, bincheck started a single node database instance without
specifying the address/port it listens on. In this case the server code
tried to resolve the hostname and use it. See
https://github.com/cockroachdb/cockroach/blob/d498a59cc2afc9778af6f7e0120206ab1ee56bc2/pkg/base/addr_validation.go#L128
for the details.

At some point this method stopped working on MacOS GitHub workers. There
is an upstream issue open: actions/runner-images#8649

This PR explicitly sets the `--listen-addr` parameter to skip the
problematic code.

Epic: none
Release note: None
craig bot pushed a commit to cockroachdb/cockroach that referenced this issue Oct 31, 2023
113502: bincheck: bind 127.0.0.1 r=celiala a=rail

Previously, bincheck started a single node database instance without specifying the address/port it listens on. In this case the server code tried to resolve the hostname and use it. See
https://github.com/cockroachdb/cockroach/blob/d498a59cc2afc9778af6f7e0120206ab1ee56bc2/pkg/base/addr_validation.go#L128 for the details.

At some point this method stopped working on MacOS GitHub workers. There is an upstream issue open: actions/runner-images#8649

This PR explicitly sets the `--listen-addr` parameter to skip the problematic code.

Epic: none
Release note: None

Co-authored-by: Rail Aliiev <rail@iqchoice.com>
blathers-crl bot pushed a commit to cockroachdb/cockroach that referenced this issue Oct 31, 2023
Previously, bincheck started a single node database instance without
specifying the address/port it listens on. In this case the server code
tried to resolve the hostname and use it. See
https://github.com/cockroachdb/cockroach/blob/d498a59cc2afc9778af6f7e0120206ab1ee56bc2/pkg/base/addr_validation.go#L128
for the details.

At some point this method stopped working on MacOS GitHub workers. There
is an upstream issue open: actions/runner-images#8649

This PR explicitly sets the `--listen-addr` parameter to skip the
problematic code.

Epic: none
Release note: None
blathers-crl bot pushed a commit to cockroachdb/cockroach that referenced this issue Oct 31, 2023
Previously, bincheck started a single node database instance without
specifying the address/port it listens on. In this case the server code
tried to resolve the hostname and use it. See
https://github.com/cockroachdb/cockroach/blob/d498a59cc2afc9778af6f7e0120206ab1ee56bc2/pkg/base/addr_validation.go#L128
for the details.

At some point this method stopped working on MacOS GitHub workers. There
is an upstream issue open: actions/runner-images#8649

This PR explicitly sets the `--listen-addr` parameter to skip the
problematic code.

Epic: none
Release note: None
blathers-crl bot pushed a commit to cockroachdb/cockroach that referenced this issue Oct 31, 2023
Previously, bincheck started a single node database instance without
specifying the address/port it listens on. In this case the server code
tried to resolve the hostname and use it. See
https://github.com/cockroachdb/cockroach/blob/d498a59cc2afc9778af6f7e0120206ab1ee56bc2/pkg/base/addr_validation.go#L128
for the details.

At some point this method stopped working on MacOS GitHub workers. There
is an upstream issue open: actions/runner-images#8649

This PR explicitly sets the `--listen-addr` parameter to skip the
problematic code.

Epic: none
Release note: None
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Nov 15, 2023
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Nov 16, 2023
@oliver-sanders
Copy link
Author

Unfortunately the workaround isn't quite enough for my use case due to other interactions which require additional workarounds. We still occasionally get test runners where reverse lookup works.

rickystewart pushed a commit to cockroachdb/cockroach that referenced this issue Nov 27, 2023
Previously, bincheck started a single node database instance without
specifying the address/port it listens on. In this case the server code
tried to resolve the hostname and use it. See
https://github.com/cockroachdb/cockroach/blob/d498a59cc2afc9778af6f7e0120206ab1ee56bc2/pkg/base/addr_validation.go#L128
for the details.

At some point this method stopped working on MacOS GitHub workers. There
is an upstream issue open: actions/runner-images#8649

This PR explicitly sets the `--listen-addr` parameter to skip the
problematic code.

Epic: none
Release note: None
MetRonnie pushed a commit to MetRonnie/cylc-flow that referenced this issue Dec 8, 2023
@MetRonnie
Copy link

MetRonnie commented Dec 14, 2023

Getting some funky behaviour with Python 3.7 socket library (with @shamil-mubarakshin's above patch applied).

Runner: macOS 12.6.9:

>>> socket.gethostname()                           
'Mac-1702490668849.local'

>>> socket.gethostbyname_ex('Mac-1702490668849.local')                
('mac-1702490668849.local', [], ['192.168.64.23'])

>>> socket.getfqdn()                               
'Mac-1702490668849.local'

>>> socket.gethostbyname_ex('Mac-1702490668849.local')                
('Mac-1702490668849.local', ['Mac-1702490668849'], ['192.168.64.23'])

(This does not happen with macOS 12.7.1 runner (see #8642):)

>>> socket.gethostname()                           
'Mac-1702490723337.local'

>>> socket.gethostbyname_ex('Mac-1702490723337.local')                
('mac-1702490723337.local', [], ['10.213.1.225'])

>>> socket.getfqdn()                               
1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa

>>> socket.gethostbyname_ex('Mac-1702490723337.local')                
('mac-1702490723337.local', [], ['10.213.1.225'])

MetRonnie pushed a commit to cylc/cylc-flow that referenced this issue Dec 18, 2023
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Dec 19, 2023
@oliver-sanders
Copy link
Author

The macOS 11 runner image will be removed by 6/28/24. To raise awareness of the upcoming removal, jobs using macOS 11 will temporarily fail during scheduled time periods defined below:

The workaround of falling back to macos 11 is about to expire, however the DNS of all new images remains problematic.

@oliver-sanders
Copy link
Author

@shawnnapora, @shamil-mubarakshin (apologies for the poke)

The workaround of using macos 11 to avoid this DNS configuration bug is about to expire. Do you know if this issue is likely to be resolved in later macos images?

@vieiro
Copy link

vieiro commented Jul 16, 2024

Here's a reproducer of the problem in case it's of any help: https://github.com/vieiro/gha-macos-resolve-hostname

@sarathrajsrinivasan
Copy link
Contributor

Hi @oliver-sanders ,

Please find the update below:

1.) Successful run for macOS12, macOS13 and macOS14 :
https://github.com/sarathrajsrinivasan/macos-test/actions/runs/9949103379/job/27484814461

2.) Use below to update "/etc/hosts":

  for host in "$(hostname)" "$(hostname -f)"; do
      echo -e "$(ipconfig getifaddr en0) $(hostname -f) $(hostname -s)" | sudo tee -a /etc/hosts 
      dscacheutil -q host -a name $(hostname -f)
  done

Updated "/etc/hosts" value:

  127.0.0.1	     localhost
  255.255.255.255    broadcasthost
  ::1                localhost
  192.168.64.19      Mac-1721092163886.local     Mac-1721092163886
  192.168.64.19      Mac-1721092163886.local     Mac-1721092163886

3.) To get the IP address from the hostname:

  (a.) We can use dscacheutil to get the ip address of the host : 

       dscacheutil -q host -a name $(hostname -f)

       name      : mac-1721092163886.local
       ip_address: 192.168.64.19

  (b.) Use below powershell code:

      $hostName = [System.Net.Dns]::GetHostName()
      [System.Net.Dns]::GetHostEntry($hostName)

      HostName                  Aliases   AddressList
      --------                  -------   -----------
      mac-1721092163886.local   {}        {192.168.64.19, fe80::1424:f824:ec93:644d%7, f…

4.) After above fix, we were able to ping the host through the hostname:

 ping -c 4 Mac-1721092163886.local

    PING mac-1721092163886.local (192.168.64.19): 56 data bytes
    64 bytes from 192.168.64.19: icmp_seq=0 ttl=64 time=0.046 ms
    64 bytes from 192.168.64.19: icmp_seq=1 ttl=64 time=0.206 ms
    64 bytes from 192.168.64.19: icmp_seq=2 ttl=64 time=0.273 ms
    64 bytes from 192.168.64.19: icmp_seq=3 ttl=64 time=0.250 ms

5.) Reg. Python's socket bindings :

  Before fix:
  ocket.gethostname()                               : Mac-1721092163886.local
  socket.getfqdn()                                  : Mac-1721092163886.local
  socket.getfqdn(socket.gethostname())              : Mac-1721092163886.local
  socket.getfqdn(socket.getfqdn())                  : Mac-1721092163886.local
  socket.gethostbyname_ex(socket.gethostname())[0]  : [Errno 8] nodename nor servname provided, or not known
  socket.gethostbyname_ex(socket.getfqdn())[0]      : [Errno 8] nodename nor servname provided, or not known

  After fix:
  socket.gethostname()                              : Mac-1721092163886.local
  socket.getfqdn()                                  : Mac-1721092163886.local
  socket.getfqdn(socket.gethostname())              : Mac-1721092163886.local
  socket.getfqdn(socket.getfqdn())                  : Mac-1721092163886.local
  socket.gethostbyname_ex(socket.gethostname())[0]  : Mac-1721092163886.local
  socket.gethostbyname_ex(socket.getfqdn())[0]      : Mac-1721092163886.local

6.) Please check the above and let us know if it helps. We are working on adding the "/etc/hosts" change as part of the image. Will keep you posted.

@MetRonnie
Copy link

@sarathrajsrinivasan we are successfully using the patch

echo -e "$(ipconfig getifaddr en0) $(hostname -f) $(hostname -s)" | sudo tee -a /etc/hosts
dscacheutil -q host -a name $(hostname -f)

but ideally this would be fixed in the image

@sarathrajsrinivasan
Copy link
Contributor

@MetRonnie Yes we are working on adding it as part of the image itself. Will update once the change is rolled out.

@oliver-sanders
Copy link
Author

Thanks for the update.

@sarathrajsrinivasan
Copy link
Contributor

Hi @oliver-sanders @MetRonnie,

We have added the above change to the "/etc/hosts" as part of the image itself. Please check.
Closing the issue now. Please let us know incase of any questions.

@MetRonnie
Copy link

I have tested this and still got the DNS problems on

Runner Image Provisioner
  2.0.374.1+4097a9592d27ce71de414581a65bffbda888dd1b

But I ran again a few times and everything worked on

Runner Image Provisioner
  2.0.382.1+d27903c82fd0a98a6c4ff2ea9e193b4413f3d608

In both cases, the other runner version information was identical

Current runner version: '2.319.1'
Operating System
  macOS
  14.6.1
Runner Image
  Image: macos-14-arm64
  Version: 20240811.1

@sarathrajsrinivasan
Copy link
Contributor

Hi @MetRonnie ,

Could you please check now. This should be resolved 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: macOS
Projects
None yet
Development

No branches or pull requests

7 participants