Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"FATA[0000] subnet 10.4.0.0/24 overlaps with other one on this address space" error when starting containers #3365

Open
astrostl opened this issue Nov 8, 2022 · 14 comments

Comments

@astrostl
Copy link

astrostl commented Nov 8, 2022

Actual Behavior

I'm on an M1 Mac. Every release after 1.5.0 has been unable to run a container and returns a subnet overlap error. Reverting to 1.5.0 resolves the issue.

Steps to Reproduce

Clean install of any release after 1.5.0 then nerdctl run -it alpine /bin/sh (or any other Linux container)

Result

FATA[0000] subnet 10.4.0.0/24 overlaps with other one on this address space

Expected Behavior

The container runs.

Additional Information

No response

Rancher Desktop Version

1.6.2

Rancher Desktop K8s Version

1.25.3

Which container engine are you using?

containerd (nerdctl)

What operating system are you using?

macOS

Operating System / Build Version

macOS Ventura 13.0

What CPU architecture are you using?

arm64 (Apple Silicon)

Linux only: what package format did you use to install Rancher Desktop?

No response

Windows User Only

No response

@astrostl astrostl added the kind/bug Something isn't working label Nov 8, 2022
@jandubois
Copy link
Member

This seems to be a duplicate of #2935, except this one is on macOS. I'm leaving it open as a separate issue for now until we can verify, but wanted to link the issues already.

@astrostl
Copy link
Author

astrostl commented Nov 9, 2022

Nod. I debated adding to the existing but they seemed sufficiently unalike as this required no upgrade/downgrade steps. Thanks!

@jandubois
Copy link
Member

@Nino-K why would reverting to 1.5.x avoid this issue? I wonder if this is specific to the nerdctl version.

@astrostl
Copy link
Author

astrostl commented Nov 9, 2022

To be specific, the issue seems to have been introduced at exactly 1.5.1.

@jandubois
Copy link
Member

To be specific, the issue seems to have been introduced at exactly 1.5.1.

That would be switching from nerdctl 0.22.0 to 0.22.2.

I suspect containerd/nerdctl#1245 is the root cause of the issue.

The problem is that nerdctl is running inside the VM, so can't actually see which subnets are available or not.

@jandubois
Copy link
Member

I just tested it, and you can change the address for the nerdctl0 network:

$ rdctl shell sudo vi /etc/cni/net.d/nerdctl-bridge.conflist

The file probably doesn't exist until you have been running nerdctl run at least once. Then you can see the gateway/subnet definition in that file:

      "ipam": {
        "ranges": [
          [
            {
              "gateway": "10.4.0.1",
              "subnet": "10.4.0.0/24"
            }
          ]
        ],

I changed mine to 10.7.0.0/24 and restarted Rancher Desktop, and then started a container. I can see that the network is now using the new subnet:

$ rdctl shell ip a show nerdctl0
3: nerdctl0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 32:c9:95:a8:e0:d5 brd ff:ff:ff:ff:ff:ff
    inet 10.7.0.1/24 brd 10.7.0.255 scope global nerdctl0
       valid_lft forever preferred_lft forever
    inet6 fe80::30c9:95ff:fea8:e0d5/64 scope link
       valid_lft forever preferred_lft forever

@jandubois
Copy link
Member

I tried to modify the network config with just nerdctl, but it looks like there is no way to edit an existing network, and I can't delete the bridge network; it is being recreated right away:

$ nerdctl network rm bridge
bridge
$ nerdctl network create bridge --gateway 10.10.10.1 --subnet 10.10.10.0/24
FATA[0000] network with name bridge already exists
$ nerdctl network inspect bridge
[
    {
        "Name": "bridge",
        "Id": "17f29b073143d8cd97b5bbe492bdeffec1c5fee55cc1fe2112c8b9335f8b6121",
        "IPAM": {
            "Config": [
                {
                    "Subnet": "10.4.0.0/24",
                    "Gateway": "10.4.0.1"
                }
            ]
        },
        "Labels": {}
    }
]

@AkihiroSuda Is there a way to modify the bridge network configuration?

More generally, how can we get nerdctl to pick a free subnet when it is running inside the VM in Lima?

Would we duplicate the GetFreeSubnet code in Lima itself, and then write the network config inside the VM as part of the bootstrap?

@AkihiroSuda
Copy link

Is there a way to modify the bridge network configuration?

vi /etc/cni/net.d/nerdctl-bridge.conflist may work

@jandubois
Copy link
Member

vi /etc/cni/net.d/nerdctl-bridge.conflist may work

Yes, it does, but you have to shell inside the VM to do it, as I've shown above:

$ rdctl shell sudo vi /etc/cni/net.d/nerdctl-bridge.conflist

So maybe that is good enough, but I would still like to find a way to configure it automatically with a non-conflicting subnet range.

Given that we'll need to do this for Windows/WSL2 as well, I guess we have to do this in RD, but I was wondering if Lima should handle this automatically too.

@jandubois
Copy link
Member

@astrostl Can you confirm that 10.4.0.0/24 is conflicting with another network on your host?

I'm now wondering why/how nerdctl run could detect this, when it originally picks this network range as free. At first I thought it would be a conflict, e.g. with a VPN network, but now I think it should be a conflict inside the VM.

Can you run rdctl shell ip a and share the output, to see if there are any other conflicting networks defined?

Did you manually create additional networks via nerdctl network create?

@astrostl
Copy link
Author

This system has two networks: 192.168.1.0/24 (WiFi) and 10.0.0.0/8 (wired). The latter would conflict with 10.4.0.0/24. No networks were manually created with nerdctl. A clean install had no conflicts prior to RD 1.5.1.

@astrostl
Copy link
Author

In terms of avoidance, methinks that 172.16.0.0/12 would be radically less likely to pose a conflict as it seems to be the least-popular of the RFC 1918 private address spaces.

@astrostl
Copy link
Author

My workaround was to reconfigure the Ubiquiti EdgeRouter X that had defaulted (??) to a /8 to use a /24 instead. That personally gets me out of the woods, but it is still a workaround for a collision that I've seen others encounter.

@jsoref
Copy link
Contributor

jsoref commented Mar 7, 2023

fwiw, i tripped on this. we actually use 10.4.0.0/... intentionally leaving space for things to use 10.0.0.0..10.3.255.255.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants