Skip to content
This repository has been archived by the owner on May 16, 2024. It is now read-only.

Unable to Connect the HCA's through the link #32

Open
solielpai opened this issue Dec 10, 2020 · 13 comments
Open

Unable to Connect the HCA's through the link #32

solielpai opened this issue Dec 10, 2020 · 13 comments

Comments

@solielpai
Copy link

solielpai commented Dec 10, 2020

I deployed the rdma device plugin in HCA mode in kubernetes cluster. When I tried to make a connection test using "ib_read_bw", the output is as follows:

                RDMA_Write BW Test

Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
TX depth : 128
CQ Moderation : 100
Mtu : 1024[B]
Link type : Ethernet
GID index : 0
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet


local address: LID 0000 xxx
GID: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
remote address: LID 0000 xxx
GID: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00

The commands I used are simply ' ib_write_bw -d mlx5_0 [target_ip]' and 'ib_read_bw -d mlx5_0
'. Could anyone please help with this issue? I appreciate your help.

@goversion
Copy link

goversion commented Dec 17, 2020

I meet the same problem as you. Did you finally solve this problem, please?
@solielpai

@huide9
Copy link

huide9 commented Jan 25, 2023

+1
I'm using connectx-5. all ib_ commands work fine in host to host communication, but failed in containers.

@heshengkai
Copy link

@huide9 @solielpai @goversion
+1
I'm using connectx-5. all ib_ commands work fine in host to host communication, but failed in containers,
my k8s cluster network plugin calico, The Infiniband card works in Ethernet mode, which cause problems. If the Infiniband card works in IB mode, it is working properly

@noama-nv
Copy link

Link type is Ethernet:
Server: ib_write_bw -d [RDMA_DEVICE] -F -R --report_gbits
Client: ib_write_bw -d [RDMA_DEVICE] [SERVER_IP] -F -R --report_gbits

Link type is IB:
Server: ib_write_bw -d [RDMA_DEVICE] -F --report_gbits
Client: ib_write_bw -d [RDMA_DEVICE] [SERVER_IP] -F --report_gbits

@heshengkai
Copy link

Hi @krembu
Server:
[root@mofed-test-cx6-pod-1 /]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1440
inet 172.16.62.144 netmask 255.255.255.255 broadcast 0.0.0.0
ether 36:bd:a0:74:97:5b txqueuelen 0 (Ethernet)
RX packets 26 bytes 2068 (2.0 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 19 bytes 1490 (1.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

[root@mofed-test-cx6-pod-1 /]# ib_write_bw -d mlx5_2 -F -R --report_gbits


  • Waiting for client to connect... *

kubectl exec -it mofed-test-cx6-pod-1 bash^C
[root@mofed-test-cx6-pod-1 /]# ib_write_bw -d mlx5_2 -F -R --report_gbits


  • Waiting for client to connect... *

Client:
[root@mofed-test-cx6-pod-2 /]# ib_write_bw -d mlx5_2 172.16.62.144 -F -R --report_gbits
Received 10 times ADDR_ERROR
Unable to perform rdma_client function
Unable to init the socket connection
[root@mofed-test-cx6-pod-2 /]# ping 172.16.62.144
PING 172.16.62.144 (172.16.62.144) 56(84) bytes of data.
64 bytes from 172.16.62.144: icmp_seq=1 ttl=63 time=0.070 ms
64 bytes from 172.16.62.144: icmp_seq=2 ttl=63 time=0.047 ms
64 bytes from 172.16.62.144: icmp_seq=3 ttl=63 time=0.047 ms
^C
--- 172.16.62.144 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2048ms
rtt min/avg/max/mdev = 0.047/0.054/0.070/0.013 ms
[root@mofed-test-cx6-pod-2 /]# ib_write_bw -d mlx5_2 172.16.62.144 -F -R --report_gbits
Received 10 times ADDR_ERROR
Unable to perform rdma_client function
Unable to init the socket connection
[root@mofed-test-cx6-pod-2 /]# ib_write_bw -d mlx5_2 172.16.62.144 -F -R --report_gbits
Received 10 times ADDR_ERROR
Unable to perform rdma_client function
Unable to init the socket connection

@noama-nv
Copy link

can you share pod spec and MacvlanNetowrk?

@heshengkai
Copy link

The Link type is IB, and the cni is calico or macvlan. Both work properly。
Link type is Ethernet, cni is calico, the test is abnormal. cni is macvlan and the test is normal

@noama-nv
Copy link

Sorry getting you working out, this project is deprecated you can use https://github.com/mellanox/k8s-rdma-shared-dev-plugin
or https://docs.nvidia.com/networking/display/COKAN10/Network+Operator

@heshengkai
Copy link

https://github.com/mellanox/k8s-rdma-shared-dev-plugin ,That's what I'm using

@heshengkai
Copy link

use image: mellanox/k8s-rdma-shared-dev-plugin

@noama-nv
Copy link

@heshengkai
Copy link

@krembu Thank you for your reply

@wwj-2017-1117
Copy link

@huide9 ,we meet the same problem

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants