Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Internet Connection inside VM / macOS m1 / 0.4.2 Nix version #344

Closed
1 of 3 tasks
tricktron opened this issue Jun 21, 2022 · 31 comments · Fixed by #352
Closed
1 of 3 tasks

No Internet Connection inside VM / macOS m1 / 0.4.2 Nix version #344

tricktron opened this issue Jun 21, 2022 · 31 comments · Fixed by #352

Comments

@tricktron
Copy link
Contributor

tricktron commented Jun 21, 2022

Describe the Issue

The DNS resolver inside the Colima VM does not work and I thus have no internet connection at all.

Everything works with Colima 0.3.4 / limactl 0.11.0 / qemu 7.0.0.

Version

Colima Version:

What is the output of colima version
0.4.2

Lima Version:

What is the output of limactl --version
0.11.0

Qemu Version

What is the output of qemu-img --version
7.0.0

Operating System

  • macOS Intel
  • macOS m1 12.4
  • Linux

To Reproduce

Steps to reproduce the behavior:

  1. docker pull node:16-alpine -> Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 192.168.107.1:53: read udp 192.168.5.15:40660->192.168.107.1:53: i/o timeout
  2. colima ssh
  3. ping google.com -> bad address 'google.com'

Expected behavior

DNS resolver and internet connections work.

Additional context

Edit: I use the nix package manager to install colima.

Content of /etc/resolv.conf:
nameserver 192.168.107.1

Starting Colima with colima start --dns 1.1.1.1 solves the dns resolution problem and connections work again but VPN does not. On 0.3.4 VPN connections work perfectly.

@abiosoft
Copy link
Owner

Does starting with colima start --dns 192.168.5.3 work with your VPN connection?

@tricktron
Copy link
Contributor Author

tricktron commented Jun 22, 2022

@abiosoft Unfortunately not. I still get the same errors:

  • docker pull node:16-alpine -> Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 192.168.5.3:53: read udp 192.168.5.15:50251->192.168.5.3:53: i/o timeout
  • ping google.com inside vm -> bad address 'google.com'

Edit: If you need more help/info, I can quickly switch versions using nix and provide it.

@tricktron
Copy link
Contributor Author

@aaschmid Pinging you since you use nix to install Colima as well. Do you experience the same problems?

@abiosoft
Copy link
Owner

@tricktron are you on any slack channel, can we get to troubleshoot this interactively? Or is there a way I can simulate your VPN environment?

@tricktron
Copy link
Contributor Author

tricktron commented Jun 22, 2022

@abiosoft

Or is there a way I can simulate your VPN environment?

I don't think so. It is a VPN using the Citrix Secure Access App to connect to one of my customers.

However, I think we need to solve the issues step by step:

  1. colima start without the --dns option results in having no connection at all as well. I don't see any address when running colima ls either:
    Screenshot 2022-06-22 at 14 30 13.

Is that a bug or expected?

are you on any slack channel, can we get to troubleshoot this interactively?

I am not using slack anymore but if you point me to the right direction/channel I can quickly reactivate my account if it helps.

@abiosoft
Copy link
Owner

abiosoft commented Jun 22, 2022

@tricktron I have been able to reproduce the issue, it seems to be specific to Nix version. There is indeed no internet 😕 .

On it! Thanks for reporting.

Side Note: I know what is happening, I only do not know why it is happening on Nix.

@aaschmid
Copy link
Contributor

@abiosoft: Can I still help to reproduce?

@abiosoft
Copy link
Owner

abiosoft commented Jun 22, 2022

@abiosoft: Can I still help to reproduce?

@aaschmid yeah you can, just to ascertain. Thanks.

@aaschmid
Copy link
Contributor

Have the same issue w/o any VPN. Strangely, I cannot ping any of the ip adresses from nslookup registry-1.docker.io, whereas other pings work just normal. Don't know if this is just normal. Just tell me if I can help further...

@rfay
Copy link
Contributor

rfay commented Jun 23, 2022

Not all hosts on the internet accept or respond to pings. registry-1.docker.io does not. curl -IL registry-1.docker.io is a better test for this one.

@aaschmid
Copy link
Contributor

aaschmid commented Jun 23, 2022

You are absolutely right (I should have been more precise), curl works just fine on console but not from within colima ssh (same as any other connection to the "outside" world).

Edit:
From within colima ssh I can ping to LAN addresses and internet addresses, strangely with a very high time, e.g. time=189853.729 ms (DUP!). However, the DNS is not working such that nslookup fails:

colima:/Users/xxxx$ nslookup google.com
;; connection timed out; no servers could be reached

Repository owner deleted a comment from Guwancsayidaliyew Jun 24, 2022
Repository owner deleted a comment from Guwancsayidaliyew Jun 24, 2022
Repository owner deleted a comment from Guwancsayidaliyew Jun 24, 2022
Repository owner deleted a comment from Guwancsayidaliyew Jun 24, 2022
@abiosoft
Copy link
Owner

@tricktron do you also have a non-Nix installation of colima on your machine?

@tricktron
Copy link
Contributor Author

@abiosoft
I also tried to do the following manually:

  • Install go 1.18, lima 0.11 via nix
  • run make build
  • run make install

But I get the same problem without any internet connection.

I have noticed that the following changed from 0.3.4 -> 0.4.1

  • go 1.17 -> go 1.18
  • alpine-lima-0.3.4-1 -> alpine-lima-0.4.2-1
  • colima 0.3.4 -> 0.4.1

Where should we search for the error?

Do we know that it works with brew on Macos m1?

@abiosoft
Copy link
Owner

abiosoft commented Jun 26, 2022

@tricktron the issue I discovered can only be triggered by a mix of Nix and non-Nix installed versions of colima. I am just trying to ascertain if that is your scenario as well.

Basically some symlinks are generated at ~/.colima/_wrapper that can become inaccurate when colima binary resides in multiple locations (which is always the case for Nix).

It was an oversight from me as I mostly build and run from source.

A fix will soon be out for you to test.

@tricktron
Copy link
Contributor Author

Basically some symlinks are generated at ~/.colima/_wrapper that can become inaccurate when colima binary resides in multiple locations (which is always the case for Nix).

@abiosoft Ah I see, your qemu wrapping here:

func main() {
_, cmd := filepath.Split(os.Args[0])
switch cmd {
case "qemu-system-x86_64", "qemu-system-aarch64":
qemuWrapper(cmd)
default:
root.Execute()
}
}

does somehow interfere with nix and its symlinks. Great catch. I'll gladly help you test the fix when it is out:smiley:.

@abiosoft
Copy link
Owner

@tricktron can you kindly build from source and try out the nix-troubleshooting branch?

@tricktron
Copy link
Contributor Author

@abiosoft wow that was quick. I'll test it tomorrow.

@tricktron
Copy link
Contributor Author

@abiosoft
With the current nix code I get the following warning/error during colima start:

WARN[0000] error setting up network dependencies: error at 'preparing network': error occured installing dependencies for 'gvproxy': error locating qemu binaries in PATH: exec: "qemu-system-aarch64": executable file not found in $PATH

I quickly added the qemu binary to the path and the warning disappeared. However, in both scenarios I still don't have any internet inside the vm.

ls -la $HOME/.colima/_warpper/bin:

  • 699aeb7a4c23553fd0a613dfe210dc4ab93f5b59

So no more qemu-* exectuables that point to colima.

What is the intended dependency of qemu for colima? Should it be on the path? What about gvproxy? What is the meaning of CGO_ENABLED=0?

@abiosoft
Copy link
Owner

abiosoft commented Jun 27, 2022

@tricktron did you attempt deleting and starting afresh? Or use another profile colima start <profile-name>.
Also, can you colima ssh and share the output of ip route?

ls -la $HOME/.colima/_warpper/bin

  • 699aeb7a4c23553fd0a613dfe210dc4ab93f5b59

699aeb7a4c23553fd0a613dfe210dc4ab93f5b59 is actually a directory.

What is the intended dependency of qemu for colima? Should it be on the path? What about gvproxy? What is the meaning of CGO_ENABLED=0?

Thanks for pointing that out it shouldn't need to be in PATH (for Nix), I'll look at that.

As for gvproxy, it is an alternate network provider for qemu.

CGO_ENABLED=0 is nothing serious as well, it makes the binary free of C related dependencies, should have been there from the onset.

@tricktron
Copy link
Contributor Author

@abiosoft

did you attempt deleting and starting afresh? Or use another profile colima start .

Yes I have always run the following before each test run:

  • colima delete
  • rm -rf $HOME/.lima
  • rm -rf $HOME/.colima

Also, can you colima ssh and share the output of ip route?

iproute
default via 192.168.5.2 dev eth0  metric 202 
172.17.0.0/16 dev docker0 scope link  src 172.17.0.1 
192.168.5.0/24 dev eth0 scope link  src 192.168.5.15

@tricktron
Copy link
Contributor Author

tricktron commented Jun 27, 2022

@abiosoft
Maybe this helps:

The created colima executable by nix is not the colima binary but the following shell wrapper file, which then calls the .colima-wrapped binary:

#! /nix/store/l81df76j5jxr8lymk9zp9af94llkir94-bash-5.1-p16/bin/bash -e
PATH=${PATH:+':'$PATH':'}
PATH=${PATH/':''/nix/store/1rpmalspdzssrh6165q0wv262vwafhdd-qemu-7.0.0/bin'':'/':'}
PATH='/nix/store/1rpmalspdzssrh6165q0wv262vwafhdd-qemu-7.0.0/bin'$PATH
PATH=${PATH#':'}
PATH=${PATH%':'}
export PATH
PATH=${PATH:+':'$PATH':'}
PATH=${PATH/':''/nix/store/3ls9sgrz6sq2gx8hpmz9s5h021jhxdrg-lima-0.11.1/bin'':'/':'}
PATH='/nix/store/3ls9sgrz6sq2gx8hpmz9s5h021jhxdrg-lima-0.11.1/bin'$PATH
PATH=${PATH#':'}
PATH=${PATH%':'}
export PATH
exec -a "$0" "/nix/store/h0mv4wmk51rbm6nssadg0iinlfdmiw79-colima-0.4.2/bin/.colima-wrapped"  "$@"

So I think you can debug this locally if you just replace all nix-paths in this file with your local paths. You can also remove qemu or lima from the path if you want.

My guess is that some env variables for gvproxy get lost on the long indirection way from nix-colima -> .colima-wrapped -> colima -> lima -> qemu.

@abiosoft
Copy link
Owner

@tricktron thanks for this, it is quite relevant.

I am still pretty new to Nix, does this mean that it is better to work off a derivation as that is the only guaranteed way to get this behaviour.

Or can I achieve this in a nix-shell as well?

Thanks.

@tricktron
Copy link
Contributor Author

tricktron commented Jun 27, 2022

@abiosoft

I am still pretty new to Nix, does this mean that it is better to work off a derivation as that is the only guaranteed way to get this behaviour.

A nix derivation is a package. The nix-shell just provides you with the dependencies of the package.

Developing/debugging a package needs a fork of nixpkgs. See https://nixos.wiki/wiki/Nixpkgs/Create_and_debug_packages.

The result of the colima derivation is the above bash file.

So you can work with that. In other words, it means that if colima works with the above bash file it works with nix. That should also means that if you replace all nix paths in the bash file with local paths, then you should be able to reproduce the no internet error. Could you try that?

@abiosoft abiosoft changed the title No Internet Connection inside VM / DNS host resolver does not work on macOS m1 / 0.4.2 No Internet Connection inside VM / macOS m1 / 0.4.2 Nix version Jun 28, 2022
@abiosoft
Copy link
Owner

abiosoft commented Jun 28, 2022

Thanks @tricktron for the heads up, I have successfully reproduced and identified the issue.

Colima wraps qemu binaries to utilise gvproxy, this is done by overriding the Qemu binaries used by Lima via PATH. However, Nix overrides this behaviour with wrapped binaries and Lima will always use Nix provided Qemu binaries.

There are three approaches.

  • Use a custom Lima derivation for Colima on Nix. I am leaning towards this.
  • Handle this Nix scenario within Colima. I am not a fan of this as I feel the code should be distribution-unaware.
  • Get Lima to support passing additional arguments to Qemu, so no workarounds will be needed. This is better, no magic involved. However, it is slower as it will take a while to get the contribution to Lima approved, merged, released and updated on Nix.

@tricktron
Copy link
Contributor Author

@abiosoft
Great that you could reproduce it👍.

I am strong believer in the golden rule of open source: push fixes upstream instead of hacking locally around them. So I would go for:

Get Lima to support passing additional arguments to Qemu, so no workarounds will be needed. This is better, no magic involved. However, it is slower as it will take a while to get the contribution to Lima approved, merged, released and updated on Nix.

In the meantime we could have a look at:

Use a custom Lima derivation for Colima on Nix. I am leaning towards this.

But before getting into that discussion: Everything worked on nix in version 0.3.4. What new features in 0.4.1 need gvproxy? I see that you introduced a daemon.

Daemons are not handled directly in nix with a derivation but with a nix module (handled by other projects such as nix-darwin and home-manager, which needs separate integration.

So I would be perfectly fine, if there is a flag --no-deamon which just ignores the daemon and gvproxy and uses the old functionality. But that only works if gvproxy is only used for your daemon feature. Is that the case?

@abiosoft
Copy link
Owner

@tricktron can you update the nix-troubleshooting branch and see if the binary from the derivation has internet access.
You can also simply run make nix-derivation-shell to build and drop into the resulting nix-shell.

Thanks as usual.

But before getting into that discussion: Everything worked on nix in version 0.3.4. What new features in 0.4.1 need gvproxy? I see that you introduced a daemon.

There were complaints of intermittent internet issues in previous versions and gvproxy seems to have provided a better experience. You were probably not affected.

So I would be perfectly fine, if there is a flag --no-deamon which just ignores the daemon and gvproxy and uses the old functionality. But that only works if gvproxy is only used for your daemon feature. Is that the case?

Yeah, this is being looked at as well. Not the flag per se but a fallback mechanism.

@tricktron
Copy link
Contributor Author

tricktron commented Jun 29, 2022

@abiosoft
It works🎉.

Direct access to the limactl binary fixes the no internet issue. Note: This already works with v0.4.1, so all the changes in your nix-troubleshooting branch have no directly visible impact.

I created two prs in nixpkgs to fix this:

I also want to use the new Makefile in the pr. Could you create a new tag/version for it so that I can use it?

@aaschmid Could you help reviewing the prs in nixpkgs?

@abiosoft
Copy link
Owner

abiosoft commented Jun 29, 2022

Yeah, I would still like to merge in the nix-troubleshooting branch as more edge cases are now handled and more importantly the tests are now fixed for nix.

Thanks for the ride, I think I learnt a bit more about Nix thanks to this issue.

@tricktron
Copy link
Contributor Author

@abiosoft Sure, go for it👍🏼

abiosoft added a commit that referenced this issue Jun 29, 2022
* chore: move CGO_ENABLE arg to makefile

* chore: use actual current executable path

* chore: fix generated binary on M1 mac

* net: fix #344 qemu process missing gvproxy config

* chore: update gitignore

* chore: refactor, mock filesystem in tests

* chore: update nix environment

* chore: refactor Makefile (#354)

* refactor: extract build logic from build.sh to Makefile

* chore: add test rule to Makefile

* apply review suggestion

Co-authored-by: Abiola Ibrahim <git@abiosoft.com>

* fix: use defined `OUTPUT_DIR` variable

* chore: remove -race flag from test as it needs CGO_ENABLED=1

* chore: generate sha in binaries directory

* chore: propagate Go build environment variables

Co-authored-by: Abiola Ibrahim <git@abiosoft.com>

* chore: disable CGO

* chore: remove empty file

Co-authored-by: tricktron <tgagnaux@gmail.com>
@abiosoft
Copy link
Owner

abiosoft commented Jun 29, 2022

I also want to use the new Makefile in the pr. Could you create a new tag/version for it so that I can use it?

@tricktron here it is https://github.com/abiosoft/colima/releases/tag/v0.4.3.
Release and release notes will be generated after the CI build is done.

@abiosoft
Copy link
Owner

Also initiated the proper fix upstream. lima-vm/lima#932

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants