Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi rdma device not working in rdma brpc branch #1401

Closed
tbago opened this issue May 12, 2021 · 6 comments
Closed

Multi rdma device not working in rdma brpc branch #1401

tbago opened this issue May 12, 2021 · 6 comments

Comments

@tbago
Copy link
Contributor

tbago commented May 12, 2021

We have multi rdma NIC in server. If we use the first rdma device by use --rdma_device=mlx5_0. The brpc client and server works well. But if we use other rdma device like --rdma_device=mlx5_1, the client cannot communicate to the server. It will show the following error:
WechatIMG24

The rdma_create_qp will failed with errno=22 (invalid param). The root cause is that the GetRdmaProtectionDomain() is use the global context, I mean that the global context may support support well or have some bugs? I have write my custom client and use rdma_cm_id->verbs as context. Then I can connect the brpc server with other rdma device.
But in the rdma_helper.cpp the pd is init before the rdma_cm_id.
So my question is does someone have the same issue will me. And how to fixed the bug, do we need the create the pd after the rdma_cm_id is created.

@tbago
Copy link
Contributor Author

tbago commented May 13, 2021

I found the rdma_cm_id is not matched when use the second device. I have add the bind function in client side. And it can fixed the client connection isuse.
WechatIMG27

@ziruiliu
Copy link

ziruiliu commented Sep 2, 2021

#1183
这个描述的问题应该和你类似,目前的代码不支持同时使用其它网卡或多块网卡

@wwbmmm
Copy link
Contributor

wwbmmm commented May 11, 2022

brpc has no plan to support multi rdma device.

@changchengx
Copy link

@wwbmmm
I tested the RoCE LAG device with below command and it could work

[307 /brpc/example/rdma_performance/]$ ./perf_server --rdma_device mlx5_bond_1
[306 /brpc/example/rdma_performance/]$ ./perf_client --rdma_device mlx5_bond_1 --servers=192.168.30.7:8002 --attachment_size=1024 --thread_num=10

According to the experiment's result, brpc doesn't support multi rdma devices, so it need to specify which device should be used with the option --rdma_device.

brpc supports RoCE LAG device, right?

@Tuvie
Copy link
Contributor

Tuvie commented May 11, 2022

@changchengx
Yes, brpc can support RoCE LAG device. But brpc does not support multi rdma devices without LAG.

@changchengx
Copy link

@Tuvie
Got it. Thanks

@wwbmmm wwbmmm closed this as completed Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants