Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netlink returns an error on the latest GKE node-version #215

Closed
glazychev-art opened this issue Nov 22, 2021 · 9 comments
Closed

netlink returns an error on the latest GKE node-version #215

glazychev-art opened this issue Nov 22, 2021 · 9 comments
Assignees

Comments

@glazychev-art
Copy link
Collaborator

Description

We have a problem with the latest GKE node images netlink.NeighSet() return error - invalid argument.

Some information:

  • Problem only with ipv6 remote kernel cases (i.e Memif2Wiregurad2Kernel, Kernel2Wiregurad2Memif, Kernel2Wiregurad2Kernel)
  • Error comes from netlink.NeighSet - Link. error - invalid argument
  • Local cases work fine
  • Ipv4 cases work fine
  • If we add ipneighbor manually on endpoint or client (not from forwarder) - all work fine.
  • Kind cluster with kernelvethpair works fine
  • It depends on GKE --node-version. The latest stable for us - 1.20.10-gke.301. The next - 1.20.10-gke.1600 - is not working
  • Logs (see TestRunFeatureSuite/TestKernel2Wireguard2Kernel) - https://github.com/networkservicemesh/integration-k8s-gke/actions/runs/1466709682
    logs-635.zip

uname -a for different versions:

  • Linux gke-nsm-1-test-default-pool-e6075302-2cb3 5.4.144+ #1 SMP Sat Sep 25 09:56:01 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux - 1.21.5-gke.1300 (Container-Optimized OS with Containerd (cos_containerd)) - not working
  • Linux gke-nsm-1-test-default-pool-d6d117aa-dmh5 5.4.120+ #1 SMP Wed Aug 18 10:20:32 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux - 1.20.10-gke.1600 (Container-Optimized OS with Containerd (cos_containerd)) - not working
  • Linux gke-nsm-1-test-default-pool-336e2956-mt4w 5.4.120+ #1 SMP Fri Jul 23 10:06:55 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux - 1.20.10-gke.301 (Container-Optimized OS with Containerd (cos_containerd)) - working

Currently we are using 1.20.10-gke.301 as workaround - #213

@edwarnicke
Copy link
Member

@glazychev-art This is very useful information, thank you for marshalling it so clearly here.

@denis-tingaikin
Copy link
Member

Currently, fixed it with workaround.
But I think we should report it to GKE.

@Mixaster995
Copy link
Contributor

reported this to GKE issue tracker https://issuetracker.google.com/issues/212300340

@glazychev-art
Copy link
Collaborator Author

The root cause was found.
Turns out we can't add a new one IPv6 ip neigbor, if required interface doesn't have IPv6 address. In other words, IPv6 should be enabled on the interface.
In our case, kernel part on ipneighbor (link, link) was set before ipaddress (link) chain element.

So, I think we need to move kernel part of ipneighbor to the right place: sdk-vpp -> sdk-kernel /connectioncontext

PRs:
networkservicemesh/sdk-kernel#409
networkservicemesh/sdk-vpp#491

@edwarnicke
Copy link
Member

@glazychev-art How does moving this from sdk-vpp to sdk-kernel fix the issue with the IPv6 IP needing to be set first?

@edwarnicke
Copy link
Member

@glazychev-art Also... how is the ipneighbor chain element from connectioncontext working with kernelvethpair? We should only be setting the ip neighbors for Src/Dst addresses if we are using a kernel mechanism that is instantiated as a vethpair (not if using a tun or tap for kernel interfaces). How does the change you pushed to sdk-kernel limit to that case?

@glazychev-art
Copy link
Collaborator Author

@edwarnicke

How does moving this from sdk-vpp to sdk-kernel fix the issue with the IPv6 IP needing to be set first?

Because ipneighbor is after ipaddress chain element on the way back of the Request.
https://github.com/networkservicemesh/sdk-kernel/blob/main/pkg/kernel/networkservice/connectioncontextkernel/server.go#L61

Also... how is the ipneighbor chain element from connectioncontext working with kernelvethpair? We should only be setting the ip neighbors for Src/Dst addresses if we are using a kernel mechanism that is instantiated as a vethpair (not if using a tun or tap for kernel interfaces). How does the change you pushed to sdk-kernel limit to that case?

I also moved peer metadata from sdk-vpp - https://github.com/networkservicemesh/sdk-kernel/pull/409/files#diff-47b2de61130a700b90fd63470c3709b7c24b822e72af97a84d3a55393b870590R20
And peer stores data only for kernelvethpair. If there is no value in peer metadata, we just skip this step (for tun/tap cases) - https://github.com/networkservicemesh/sdk-kernel/pull/409/files#diff-efb1ef53236dbb4c227dc6834f004ebb1b7647fc5f13e84719b5ae4233af82b4R64

@edwarnicke
Copy link
Member

edwarnicke commented Jan 21, 2022

I also moved peer metadata from sdk-vpp - https://github.com/networkservicemesh/sdk-kernel/pull/409/files#diff-47b2de61130a700b90fd63470c3709b7c24b822e72af97a84d3a55393b870590R20
And peer stores data only for kernelvethpair. If there is no value in peer metadata, we just skip this step (for tun/tap cases) - https://github.com/networkservicemesh/sdk-kernel/pull/409/files#diff-efb1ef53236dbb4c227dc6834f004ebb1b7647fc5f13e84719b5ae4233af82b4R64

Clever :)

@denis-tingaikin
Copy link
Member

@edwarnicke Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants