Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to open memory domain mlx5_0 on Bluefield 2 #10541

Open
aedoq opened this issue Mar 7, 2025 · 0 comments
Open

Failed to open memory domain mlx5_0 on Bluefield 2 #10541

aedoq opened this issue Mar 7, 2025 · 0 comments
Labels

Comments

@aedoq
Copy link

aedoq commented Mar 7, 2025

Describe the bug

On our Nvidia Bluefield 2, the mlx5_0 memory domain cannot be opened:

$ ~/local/ucx-bisect/bin/ucx_info -T
[1741378519.572699] [<hostname>-dpu:3871684:0]         ib_mlx5.c:610  UCX  ERROR mlx5_0: both WC and NC_DEDICATED UAR allocation types  are not supported
[1741378519.591187] [<hostname>-dpu:3871684:0]         ib_mlx5.c:610  UCX  ERROR mlx5_0: both WC and NC_DEDICATED UAR allocation types  are not supported
# < failed to open memory domain mlx5_0 >
#
# System topology
#
# +--------+----------+
# |        |          |
# |   MB/s |   mlx5_0 |
# |        |          |
# +--------+----------+
# |        |          |
# | mlx5_0 |        - |
# |        |          |
# +--------+----------+
#
# NUMA memory latency
#
# +--------+----------+
# |        |          |
# | device |   mlx5_0 |
# |        |          |
# +--------+----------+
# |        |          |
# |   nsec |    100.0 |
# |        |          |
# +--------+----------+
# Memory latency is calculated according to the CPU affinity

This is a regression: The first failing commit is ce38486.
I am pretty sure this is the underlying issue of a failure to import memory handles on the DPU:

[<hostname>-dpu:1514929:0:1514929]      ucp_mm.c:1780 Assertion `md_attr->flags & UCT_MD_FLAG_EXPORTED_MKEY' failed
BFD: DWARF error: section .debug_info is larger than its filesize! (0x84c8ff vs 0x4a4f98)

/home/<...>/ucx/src/ucp/core/ucp_mm.c: [ ucp_memh_import_attach() ]
      ...
     1777         tl_mkey_buf = tl_mkeys[tl_mkey_index].tl_mkey_buf;
     1778         ucs_for_each_bit(md_index, tl_mkeys[tl_mkey_index].local_md_map) {
     1779             md_attr = &context->tl_mds[md_index].attr;
==>  1780             ucs_assert_always(md_attr->flags & UCT_MD_FLAG_EXPORTED_MKEY);
     1781
     1782             if (memh->uct[md_index] != NULL) {
     1783                 continue;

(Reproducer) Run make cpu && ./cpu on the CPU and make dpu && ./dpu on the DPU. Simply copy the mkey from the CPU shell to the DPU.
This case fails since 684f818, so I'm not sure just how related this is.

Steps to Reproduce

  • Command line
    ucx_info -T
  • UCX version used + UCX configure flags
    ce38486 or v1.18.0.
$ ucx_info -v
# Library version: 1.18.0
# Library path: /home/<...>/local/ucx-bisect/lib/libucs.so.0
# API headers version: 1.18.0
# Git branch '', revision ce38486
# Configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --prefix /home/<...>/local/ucx-bisect
  • Any UCX environment variables used
    None.

Setup and versions

  • OS version + CPU architecture
    Ubuntu 20.04.3 LTS on aarch64
    • cat /etc/issue
      Ubuntu 20.04.3 LTS \n \l
    • uname -a
      Linux <hostname>-dpu 5.4.0-1023-bluefield #26-Ubuntu SMP PREEMPT Wed Dec 1 23:59:51 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
    • For Nvidia Bluefield SmartNIC include cat /etc/mlnx-release
      DOCA_v1.2.1_BlueField_OS_Ubuntu_20.04-5.4.0-1023-bluefield-5.5-2.1.7.0-3.8.5.12027-1.signed-aarch64
  • For RDMA/IB/RoCE related issues:
    • Driver version:
      • MLNX_OFED version ofed_info -s
        MLNX_OFED_LINUX-5.5-2.1.7.0:
ibstat
CA 'mlx5_0'
	CA type: MT41686
	Number of ports: 1
	Firmware version: 24.35.3502
	Hardware version: 1
	Node GUID: 0x1070fd03002e730a
	System image GUID: 0x1070fd03002e7308
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 60
		LMC: 0
		SM lid: 60
		Capability mask: 0xa651e84a
		Port GUID: 0x1070fd03002e730a
		Link layer: InfiniBand
ibv_devinfo -vv
hca_id:	mlx5_0
	transport:			InfiniBand (0)
	fw_ver:				24.35.3502
	node_guid:			1070:fd03:002e:730a
	sys_image_guid:			1070:fd03:002e:7308
	vendor_id:			0x02c9
	vendor_part_id:			41686
	hw_ver:				0x1
	board_id:			MT_0000000715
	phys_port_cnt:			1
	max_mr_size:			0xffffffffffffffff
	page_size_cap:			0xfffffffffffff000
	max_qp:				131072
	max_qp_wr:			32768
	device_cap_flags:		0x697e1c36
					BAD_PKEY_CNTR
					BAD_QKEY_CNTR
					AUTO_PATH_MIG
					CHANGE_PHY_PORT
					PORT_ACTIVE_EVENT
					SYS_IMAGE_GUID
					RC_RNR_NAK_GEN
					MEM_WINDOW
					UD_IP_CSUM
					XRC
					MEM_MGT_EXTENSIONS
					MEM_WINDOW_TYPE_2B
					MANAGED_FLOW_STEERING
					Unknown flags: 0x48480000
	max_sge:			30
	max_sge_rd:			30
	max_cq:				16777216
	max_cqe:			4194303
	max_mr:				16777216
	max_pd:				8388608
	max_qp_rd_atom:			16
	max_ee_rd_atom:			0
	max_res_rd_atom:		2097152
	max_qp_init_rd_atom:		16
	max_ee_init_rd_atom:		0
	atomic_cap:			ATOMIC_HCA (1)
	max_ee:				0
	max_rdd:			0
	max_mw:				16777216
	max_raw_ipv6_qp:		0
	max_raw_ethy_qp:		0
	max_mcast_grp:			2097152
	max_mcast_qp_attach:		240
	max_total_mcast_qp_attach:	503316480
	max_ah:				2147483647
	max_fmr:			0
	max_srq:			8388608
	max_srq_wr:			32767
	max_srq_sge:			31
	max_pkeys:			128
	local_ca_ack_delay:		16
	general_odp_caps:
	rc_odp_caps:
					NO SUPPORT
	uc_odp_caps:
					NO SUPPORT
	ud_odp_caps:
					NO SUPPORT
	xrc_odp_caps:
					NO SUPPORT
	completion timestamp_mask:			0x7fffffffffffffff
	hca_core_clock:			1000000kHZ
	device_cap_flags_ex:		0x30000051697E1C36
					PCI_WRITE_END_PADDING
					Unknown flags: 0x3000004100000000
	tso_caps:
		max_tso:			0
	rss_caps:
		max_rwq_indirection_tables:			0
		max_rwq_indirection_table_size:			0
		rx_hash_function:				0x0
		rx_hash_fields_mask:				0x0
	max_wq_type_rq:			0
	packet_pacing_caps:
		qp_rate_limit_min:	0kbps
		qp_rate_limit_max:	0kbps
	max_rndv_hdr_size:		64
	max_num_tags:			127
	max_ops:			32768
	max_sge:			1
	flags:
					IBV_TM_CAP_RC

	cq moderation caps:
		max_cq_count:	65535
		max_cq_period:	4095 us

	num_comp_vectors:		8
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		4096 (5)
			active_mtu:		4096 (5)
			sm_lid:			60
			port_lid:		60
			port_lmc:		0x00
			link_layer:		InfiniBand
			max_msg_sz:		0x40000000
			port_cap_flags:		0xa251e84a
			port_cap_flags2:	0x0032
			max_vl_num:		4 (3)
			bad_pkey_cntr:		0x0
			qkey_viol_cntr:		0x0
			sm_sl:			0
			pkey_tbl_len:		128
			gid_tbl_len:		8
			subnet_timeout:		18
			init_type_reply:	0
			active_width:		2X (16)
			active_speed:		50.0 Gbps (64)
			phys_state:		LINK_UP (5)
			GID[  0]:		fe80:0000:0000:0000:1070:fd03:002e:730a
ucx_info -d
#
# Memory domain: self
#     Component: self
#             register: unlimited, cost: 0 nsec
#           remote key: 0 bytes
#           rkey_ptr is supported
#         memory types: host (access,reg_nonblock,reg,cache)
#
#      Transport: self
#         Device: memory
#           Type: loopback
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 19360.00 MB/sec
#              latency: 0 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 8K
#             am_bcopy: <= 8K
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 0 bytes
#        iface address: 8 bytes
#       error handling: ep_check
#
#
# Memory domain: tcp
#     Component: tcp
#             register: unlimited, cost: 0 nsec
#           remote key: 0 bytes
#         memory types: host (access,reg_nonblock,reg,cache)
#
#      Transport: tcp
#         Device: ibp3s0
#           Type: network
#  System device: ibp3s0 (0)
#
#      capabilities:
#            bandwidth: 2200.00/ppn + 0.00 MB/sec
#              latency: 5206 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 1
#     device num paths: 1
#              max eps: 256
#       device address: 6 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure, ep_check, keepalive
#
#      Transport: tcp
#         Device: lo
#           Type: network
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 11.91/ppn + 0.00 MB/sec
#              latency: 10960 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 1
#     device num paths: 1
#              max eps: 256
#       device address: 18 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure, ep_check, keepalive
#
#      Transport: tcp
#         Device: tmfifo_net0
#           Type: network
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 11.32/ppn + 0.00 MB/sec
#              latency: 10960 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 0
#     device num paths: 1
#              max eps: 256
#       device address: 6 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure, ep_check, keepalive
#
#
# Connection manager: tcp
#      max_conn_priv: 2064 bytes
#
# Memory domain: sysv
#     Component: sysv
#             allocate: unlimited
#           remote key: 12 bytes
#           rkey_ptr is supported
#         memory types: host (access,alloc,cache)
#
#      Transport: sysv
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 15360.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 100
#             am_bcopy: <= 8256
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 8 bytes
#       error handling: ep_check
#
#
# Memory domain: posix
#     Component: posix
#             allocate: <= 8161176K
#           remote key: 24 bytes
#           rkey_ptr is supported
#         memory types: host (access,alloc,cache)
#
#      Transport: posix
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 15360.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 100
#             am_bcopy: <= 8256
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 8 bytes
#       error handling: ep_check
#
[1741379617.957943] [<hostname>-dpu:3873544:0]         ib_mlx5.c:610  UCX  ERROR mlx5_0: both WC and NC_DEDICATED UAR allocation types  are not supported
#
# Memory domain: mlx5_0
#     Component: ib
#             register: unlimited, cost: 16000 + 0.060 * N nsec
#           remote key: 8 bytes
#           local memory handle is required for zcopy
#         memory types: host (access,reg,cache)
#
#      Transport: rc_verbs
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (0)
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 600 + 1.000 * N nsec
#             overhead: 75 nsec
#            put_short: <= 124
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 5 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 5 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 123
#             am_bcopy: <= 8255
#             am_zcopy: <= 8255, up to 4 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 127
#               domain: device
#           atomic_add: 64 bit
#          atomic_fadd: 64 bit
#         atomic_cswap: 64 bit
#           connection: to ep
#      device priority: 61
#     device num paths: 1
#              max eps: 256
#       device address: 3 bytes
#           ep address: 4 bytes
#       error handling: peer failure, ep_check
#
#
#      Transport: ud_verbs
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (0)
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 630 nsec
#             overhead: 105 nsec
#             am_short: <= 116
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 5 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 3992
#           connection: to ep, to iface
#      device priority: 61
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure, ep_check
#
#
#      Transport: dc_mlx5
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (0)
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 660 nsec
#             overhead: 40 nsec
#            put_short: <= 172
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 11 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 11 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 186
#             am_bcopy: <= 8254
#             am_zcopy: <= 8254, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 138
#               domain: device
#           atomic_add: 64 bit
#          atomic_fadd: 64 bit
#         atomic_cswap: 64 bit
#           connection: to iface
#      device priority: 61
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 5 bytes
#       error handling: buffer (zcopy), remote access, peer failure
#
#
#      Transport: rc_mlx5
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (0)
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 600 + 1.000 * N nsec
#             overhead: 40 nsec
#            put_short: <= 220
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 14 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 14 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 234
#             am_bcopy: <= 8254
#             am_zcopy: <= 8254, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 186
#               domain: device
#           atomic_add: 64 bit
#          atomic_fadd: 64 bit
#         atomic_cswap: 64 bit
#           connection: to ep
#      device priority: 61
#     device num paths: 1
#              max eps: 256
#       device address: 3 bytes
#           ep address: 7 bytes
#       error handling: buffer (zcopy), remote access, peer failure, ep_check
#
#
#      Transport: ud_mlx5
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (0)
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 630 nsec
#             overhead: 80 nsec
#             am_short: <= 180
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 132
#           connection: to ep, to iface
#      device priority: 61
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure, ep_check
#
[1741379618.041264] [<hostname>-dpu:3873544:0]         ib_mlx5.c:610  UCX  ERROR mlx5_0: both WC and NC_DEDICATED UAR allocation types  are not supported
# < failed to open memory domain mlx5_0 >
#
# Connection manager: rdmacm
#      max_conn_priv: 54 bytes
#
# Memory domain: cma
#     Component: cma
#             register: unlimited, cost: 9 nsec
#         memory types: host (access,reg_nonblock,reg,cache)
#
#      Transport: cma
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 11145.00 MB/sec
#              latency: 80 nsec
#             overhead: 2000 nsec
#            put_zcopy: unlimited, up to 16 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 1
#            get_zcopy: unlimited, up to 16 iov
#  get_opt_zcopy_align: <= 1
#        get_align_mtu: <= 1
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 4 bytes
#       error handling: peer failure, ep_check
#

I'm happy to produce any further diagnostics you might need.

@aedoq aedoq added the Bug label Mar 7, 2025
@aedoq aedoq changed the title Failed to open memory domain mlx5_0 Failed to open memory domain mlx5_0 on Bluefield 2 Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant