[jenkins03:02158] [[13077,0],0] orted:comm:process_commands() Processing Command: ORTE_DAEMON_ADD_LOCAL_PROCS [jenkins03:02158] [[13077,0],0] orted_cmd: received add_local_procs MPIR_being_debugged = 0 MPIR_debug_state = 1 MPIR_partial_attach_ok = 1 MPIR_i_am_starter = 0 MPIR_forward_output = 0 MPIR_proctable_size = 2 MPIR_proctable: (i, host, exe, pid) = (0, jenkins03, /usr/bin/taskset, 2164) (i, host, exe, pid) = (1, jenkins03, /usr/bin/taskset, 2165) MPIR_executable_path: NULL MPIR_server_arguments: NULL [jenkins03:02164] mca: base: components_register: registering framework btl components [jenkins03:02164] mca: base: components_register: found loaded component openib [jenkins03:02164] mca: base: components_register: component openib register function successful [jenkins03:02164] mca: base: components_register: found loaded component self [jenkins03:02164] mca: base: components_register: component self register function successful [jenkins03:02164] mca: base: components_open: opening btl components [jenkins03:02164] mca: base: components_open: found loaded component openib [jenkins03:02164] mca: base: components_open: component openib open function successful [jenkins03:02164] mca: base: components_open: found loaded component self [jenkins03:02164] mca: base: components_open: component self open function successful [jenkins03:02164] select: initializing btl component openib [jenkins03:02164] Checking distance from this process to device=mlx5_0 [jenkins03:02164] hwloc_distances->nbobjs=2 [jenkins03:02164] hwloc_distances->latency[0]=1.000000 [jenkins03:02164] hwloc_distances->latency[1]=2.100000 [jenkins03:02164] hwloc_distances->latency[2]=2.100000 [jenkins03:02164] hwloc_distances->latency[3]=1.000000 [jenkins03:02164] ibv_obj->logical_index=0 [jenkins03:02164] my_obj->logical_index=0 [jenkins03:02164] Process is bound: distance to device is 1.000000 [jenkins03:02164] Checking distance from this process to device=mlx4_0 [jenkins03:02164] hwloc_distances->nbobjs=2 [jenkins03:02164] hwloc_distances->latency[0]=1.000000 [jenkins03:02164] hwloc_distances->latency[1]=2.100000 [jenkins03:02164] hwloc_distances->latency[2]=2.100000 [jenkins03:02164] hwloc_distances->latency[3]=1.000000 [jenkins03:02164] ibv_obj->logical_index=0 [jenkins03:02164] my_obj->logical_index=0 [jenkins03:02164] Process is bound: distance to device is 1.000000 [jenkins03:02165] mca: base: components_register: registering framework btl components [jenkins03:02165] mca: base: components_register: found loaded component openib [jenkins03:02165] mca: base: components_register: component openib register function successful [jenkins03:02165] mca: base: components_register: found loaded component self [jenkins03:02165] mca: base: components_register: component self register function successful [jenkins03:02165] mca: base: components_open: opening btl components [jenkins03:02165] mca: base: components_open: found loaded component openib [jenkins03:02165] mca: base: components_open: component openib open function successful [jenkins03:02165] mca: base: components_open: found loaded component self [jenkins03:02165] mca: base: components_open: component self open function successful [jenkins03:02165] select: initializing btl component openib [jenkins03:02165] Checking distance from this process to device=mlx5_0 [jenkins03:02165] hwloc_distances->nbobjs=2 [jenkins03:02165] hwloc_distances->latency[0]=1.000000 [jenkins03:02165] hwloc_distances->latency[1]=2.100000 [jenkins03:02165] hwloc_distances->latency[2]=2.100000 [jenkins03:02165] hwloc_distances->latency[3]=1.000000 [jenkins03:02165] ibv_obj->logical_index=0 [jenkins03:02165] my_obj->logical_index=0 [jenkins03:02165] Process is bound: distance to device is 1.000000 [jenkins03:02165] Checking distance from this process to device=mlx4_0 [jenkins03:02165] hwloc_distances->nbobjs=2 [jenkins03:02165] hwloc_distances->latency[0]=1.000000 [jenkins03:02165] hwloc_distances->latency[1]=2.100000 [jenkins03:02165] hwloc_distances->latency[2]=2.100000 [jenkins03:02165] hwloc_distances->latency[3]=1.000000 [jenkins03:02165] ibv_obj->logical_index=0 [jenkins03:02165] my_obj->logical_index=0 [jenkins03:02165] Process is bound: distance to device is 1.000000 [jenkins03][[13077,1],0][btl_openib_component.c:637:init_one_port] looking for mlx5_0:1 GID index 0 [jenkins03][[13077,1],0][btl_openib_component.c:668:init_one_port] my IB subnet_id for HCA mlx5_0 port 1 is fe80000000000000 [jenkins03][[13077,1],1][btl_openib_component.c:637:init_one_port] looking for mlx5_0:1 GID index 0 [jenkins03][[13077,1],1][btl_openib_component.c:668:init_one_port] my IB subnet_id for HCA mlx5_0 port 1 is fe80000000000000 [jenkins03][[13077,1],0][btl_openib_component.c:962:device_destruct] failed to release registration cache [jenkins03][[13077,1],0][btl_openib_component.c:995:device_destruct] Failed to destroy device resources [jenkins03][[13077,1],1][btl_openib_component.c:962:device_destruct] failed to release registration cache [jenkins03][[13077,1],1][btl_openib_component.c:995:device_destruct] Failed to destroy device resources [jenkins03][[13077,1],0][btl_openib_ip.c:366:add_rdma_addr] Adding addr 165.165.165.2 (0x2a5a5a5) subnet 0xa5a5a500 as mlx5_0:1 [jenkins03][[13077,1],1][btl_openib_ip.c:366:add_rdma_addr] Adding addr 165.165.165.2 (0x2a5a5a5) subnet 0xa5a5a500 as mlx5_0:1 [jenkins03][[13077,1],0][btl_openib_component.c:1351:setup_qps] srq: rd_num is 1024 rd_low is 768 sd_max is 192 rd_max is 256 srq_limit is 48 [jenkins03][[13077,1],0][btl_openib_component.c:1351:setup_qps] srq: rd_num is 512 rd_low is 384 sd_max is 96 rd_max is 128 srq_limit is 24 [jenkins03][[13077,1],0][btl_openib_component.c:1351:setup_qps] srq: rd_num is 512 rd_low is 384 sd_max is 96 rd_max is 128 srq_limit is 24 [jenkins03][[13077,1],0][connect/btl_openib_connect_rdmacm.c:2019:rdmacm_component_query] rdmacm CPC not supported with XRC receive queues, please try xoob CPC; skipped on mlx5_0:1 [jenkins03:02164] openib BTL: rdmacm CPC unavailable for use on mlx5_0:1; skipped [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:682:udcm_module_init] created cpc module 0x790350 for btl 0x771460 [jenkins03][[13077,1],1][btl_openib_component.c:1351:setup_qps] srq: rd_num is 1024 rd_low is 768 sd_max is 192 rd_max is 256 srq_limit is 48 [jenkins03][[13077,1],1][btl_openib_component.c:1351:setup_qps] srq: rd_num is 512 rd_low is 384 sd_max is 96 rd_max is 128 srq_limit is 24 [jenkins03][[13077,1],1][btl_openib_component.c:1351:setup_qps] srq: rd_num is 512 rd_low is 384 sd_max is 96 rd_max is 128 srq_limit is 24 [jenkins03:02165] openib BTL: rdmacm CPC unavailable for use on mlx5_0:1; skipped [jenkins03][[13077,1],1][connect/btl_openib_connect_rdmacm.c:2019:rdmacm_component_query] rdmacm CPC not supported with XRC receive queues, please try xoob CPC; skipped on mlx5_0:1 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:682:udcm_module_init] created cpc module 0x790310 for btl 0x771420 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:918:udcm_module_create_listen_qp] creating listen QP on port 1 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:918:udcm_module_create_listen_qp] creating listen QP on port 1 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:979:udcm_module_create_listen_qp] listening for connections on lid 3, qpn 129863 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:743:udcm_module_init] my modex = LID: 3, Port: 1, QPN: 129863 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:979:udcm_module_create_listen_qp] listening for connections on lid 3, qpn 129864 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:743:udcm_module_init] my modex = LID: 3, Port: 1, QPN: 129864 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:491:udcm_component_query] available for use on mlx5_0:1 [jenkins03:02164] [rank=0] openib: using port mlx5_0:1 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:491:udcm_component_query] available for use on mlx5_0:1 [jenkins03:02165] [rank=1] openib: using port mlx5_0:1 [jenkins03:02164] select: init of component openib returned success [jenkins03:02164] select: initializing btl component self [jenkins03:02164] select: init of component self returned success [jenkins03:02165] select: init of component openib returned success [jenkins03:02165] select: initializing btl component self [jenkins03:02165] select: init of component self returned success [1499972508.846128] [jenkins03:2164 :0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3499.98 [1499972508.846131] [jenkins03:2165 :0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3499.98 [jenkins03:02164] mca: bml: Using self btl for send to [[13077,1],0] on node jenkins03 [jenkins03:02165] mca: bml: Using self btl for send to [[13077,1],1] on node jenkins03 [jenkins03][[13077,1],1][btl_openib_proc.c:218:mca_btl_openib_proc_get_locked] unpack: 1 btls [jenkins03][[13077,1],1][btl_openib_proc.c:238:mca_btl_openib_proc_get_locked] unpacked btl 0: modex message, offset now 26 [jenkins03][[13077,1],1][btl_openib_proc.c:244:mca_btl_openib_proc_get_locked] unpacked btl 0: number of cpcs to follow 1 (offset now 27) [jenkins03][[13077,1],1][btl_openib_proc.c:259:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: index 3 (offset now 28) [jenkins03][[13077,1],1][btl_openib_proc.c:263:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: component udcm [jenkins03][[13077,1],1][btl_openib_proc.c:270:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: priority 63, msg len 8 (offset now 30) [jenkins03][[13077,1],1][btl_openib_proc.c:284:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: blob unpacked 129863 10003 (offset now 38) [jenkins03][[13077,1],1][btl_openib_proc.c:300:mca_btl_openib_proc_get_locked] unpacking done! [jenkins03][[13077,1],0][btl_openib_proc.c:218:mca_btl_openib_proc_get_locked] unpack: 1 btls [jenkins03][[13077,1],0][btl_openib_proc.c:238:mca_btl_openib_proc_get_locked] unpacked btl 0: modex message, offset now 26 [jenkins03][[13077,1],0][btl_openib_proc.c:244:mca_btl_openib_proc_get_locked] unpacked btl 0: number of cpcs to follow 1 (offset now 27) [jenkins03][[13077,1],0][btl_openib_proc.c:259:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: index 3 (offset now 28) [jenkins03][[13077,1],0][btl_openib_proc.c:263:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: component udcm [jenkins03][[13077,1],0][btl_openib_proc.c:270:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: priority 63, msg len 8 (offset now 30) [jenkins03][[13077,1],0][btl_openib_proc.c:284:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: blob unpacked 129863 10003 (offset now 38) [jenkins03][[13077,1],0][btl_openib_proc.c:300:mca_btl_openib_proc_get_locked] unpacking done! [jenkins03][[13077,1],1][btl_openib_proc.c:218:mca_btl_openib_proc_get_locked] unpack: 1 btls [jenkins03][[13077,1],1][btl_openib_proc.c:238:mca_btl_openib_proc_get_locked] unpacked btl 0: modex message, offset now 26 [jenkins03][[13077,1],1][btl_openib_proc.c:244:mca_btl_openib_proc_get_locked] unpacked btl 0: number of cpcs to follow 1 (offset now 27) [jenkins03][[13077,1],1][btl_openib_proc.c:259:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: index 3 (offset now 28) [jenkins03][[13077,1],1][btl_openib_proc.c:263:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: component udcm [jenkins03][[13077,1],1][btl_openib_proc.c:270:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: priority 63, msg len 8 (offset now 30) [jenkins03][[13077,1],1][btl_openib_proc.c:284:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: blob unpacked 129864 10003 (offset now 38) [jenkins03][[13077,1],1][btl_openib_proc.c:300:mca_btl_openib_proc_get_locked] unpacking done! [jenkins03][[13077,1],0][btl_openib_proc.c:218:mca_btl_openib_proc_get_locked] unpack: 1 btls [jenkins03][[13077,1],0][btl_openib_proc.c:238:mca_btl_openib_proc_get_locked] unpacked btl 0: modex message, offset now 26 [jenkins03][[13077,1],0][btl_openib_proc.c:244:mca_btl_openib_proc_get_locked] unpacked btl 0: number of cpcs to follow 1 (offset now 27) [jenkins03][[13077,1],0][btl_openib_proc.c:259:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: index 3 (offset now 28) [jenkins03][[13077,1],0][btl_openib_proc.c:263:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: component udcm [jenkins03][[13077,1],0][btl_openib_proc.c:270:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: priority 63, msg len 8 (offset now 30) [jenkins03][[13077,1],0][btl_openib_proc.c:284:mca_btl_openib_proc_get_locked] unpacked btl 0: cpc 0: blob unpacked 129864 10003 (offset now 38) [jenkins03][[13077,1],0][btl_openib_proc.c:300:mca_btl_openib_proc_get_locked] unpacking done! [jenkins03][[13077,1],1][btl_openib.c:849:init_ib_proc_nolock] got 1 port_infos [jenkins03][[13077,1],1][btl_openib.c:852:init_ib_proc_nolock] got a subnet fe80000000000000 [jenkins03][[13077,1],1][btl_openib.c:855:init_ib_proc_nolock] Got a matching subnet! [jenkins03][[13077,1],1][btl_openib.c:849:init_ib_proc_nolock] got 1 port_infos [jenkins03][[13077,1],1][btl_openib.c:852:init_ib_proc_nolock] got a subnet fe80000000000000 [jenkins03][[13077,1],1][btl_openib.c:855:init_ib_proc_nolock] Got a matching subnet! [jenkins03][[13077,1],0][btl_openib.c:849:init_ib_proc_nolock] got 1 port_infos [jenkins03][[13077,1],0][btl_openib.c:852:init_ib_proc_nolock] got a subnet fe80000000000000 [jenkins03][[13077,1],0][btl_openib.c:855:init_ib_proc_nolock] Got a matching subnet! [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2540:udcm_xrc_send_qp_create] creating xrc send qp [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2540:udcm_xrc_send_qp_create] creating xrc send qp [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2677:udcm_xrc_recv_qp_create] creating xrc receive qp [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2677:udcm_xrc_recv_qp_create] creating xrc receive qp [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2440:udcm_xrc_send_qp_connect] Connecting send qp: 0x85a688, remote qp: 129875 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2440:udcm_xrc_send_qp_connect] Connecting send qp: 0x8656c8, remote qp: 129876 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:581:udcm_endpoint_init_self_xrc] successfully created loopback queue pair [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1926:udcm_finish_connection] finishing connection for endpoint 0x83f650. [jenkins03][[13077,1],0][btl_openib.c:849:init_ib_proc_nolock] got 1 port_infos [jenkins03][[13077,1],0][btl_openib.c:852:init_ib_proc_nolock] got a subnet fe80000000000000[jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:581:udcm_endpoint_init_self_xrc] successfully created loopback queue pair [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1926:udcm_finish_connection] finishing connection for endpoint 0x85a680. [jenkins03][[13077,1],1][btl_openib.c:1309:mca_btl_openib_del_procs] in del_procs 1, setting another endpoint to null [jenkins03:02165] mca: bml: Using openib btl for send to [[13077,1],0] on node jenkins03 [jenkins03][[13077,1],0][btl_openib.c:855:init_ib_proc_nolock] Got a matching subnet! [jenkins03][[13077,1],0][btl_openib.c:1309:mca_btl_openib_del_procs] in del_procs 0, setting another endpoint to null [jenkins03][[13077,1],1][btl_openib_endpoint.c:406:mca_btl_openib_endpoint_destruct] Unregistered XRC Recv QP:129876 [jenkins03][[13077,1],0][btl_openib_endpoint.c:406:mca_btl_openib_endpoint_destruct] Unregistered XRC Recv QP:129875 [jenkins03:02164] mca: bml: Using openib btl for send to [[13077,1],1] on node jenkins03 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:796:udcm_module_start_connect] endpoint 0x83f620 (lid 3, ep index 0) [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2360:udcm_xrc_start_connect] The IB addr: sid fe80000000000000 lid 101 with status 3, subscribing to this address [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2826:udcm_xrc_send_request] sending xrc request for endpoint 0x83f620 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2305:udcm_set_message_timeout] activating timeout for message 0x9831d0 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1670:udcm_new_message] created message 0x9831d0 with type 105 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2843:udcm_xrc_send_request] Sending XConnect2 with qp: 129876 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1975:udcm_process_messages] WC: wr_id: 0x0000000536870912, status: 0, opcode: 0x80, byte_len: 60, imm_data: 0x00000000, qp_num: 0x0001fb47, src_qp: 0x0001fb48, wc_flags: 0x0, slid: 0x0003 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2020:udcm_process_messages] received message. type: 105, lcl_ep = 0x8af020, rem_ep = 0x83f620, src qpn = 129864, length = 96, local buffer # = 0 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1771:udcm_send_ack] sending ack for message 0x9831d0 on ep 0x8af020 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1975:udcm_process_messages] WC: wr_id: 0x0000000536870912, status: 0, opcode: 0x80, byte_len: 60, imm_data: 0x00000000, qp_num: 0x0001fb48, src_qp: 0x0001fb47, wc_flags: 0x0, slid: 0x0003 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1788:udcm_handle_ack] got ack for message 0x9831d0 from slid 0x0003 qp 0x0001fb47 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1796:udcm_handle_ack] found matching message [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2174:udcm_message_callback] running message thread [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2219:udcm_message_callback] exiting message thread [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:796:udcm_module_start_connect] endpoint 0x8af020 (lid 3, ep index 1) [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2360:udcm_xrc_start_connect] The IB addr: sid fe80000000000000 lid 101 with status 3, subscribing to this address [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2826:udcm_xrc_send_request] sending xrc request for endpoint 0x8af020 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2305:udcm_set_message_timeout] activating timeout for message 0x973310 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1670:udcm_new_message] created message 0x973310 with type 105 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2843:udcm_xrc_send_request] Sending XConnect2 with qp: 129875 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2174:udcm_message_callback] running message thread [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2635:udcm_xrc_recv_qp_connect] Connecting Recv QP [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2640:udcm_xrc_recv_qp_connect] Failed to register qp_num: 129876, get error: Invalid argument (22) . Replying with RNR [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2980:udcm_xrc_handle_xconnect] rejecting request for reason -3 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2305:udcm_set_message_timeout] activating timeout for message 0x973500 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1670:udcm_new_message] created message 0x973500 with type 102 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1975:udcm_process_messages] WC: wr_id: 0x0000000536870913, status: 0, opcode: 0x80, byte_len: 60, imm_data: 0x00000000, qp_num: 0x0001fb48, src_qp: 0x0001fb47, wc_flags: 0x0, slid: 0x0003 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2020:udcm_process_messages] received message. type: 105, lcl_ep = 0x83f620, rem_ep = 0x8af020, src qpn = 129863, length = 96, local buffer # = 1 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1771:udcm_send_ack] sending ack for message 0x973310 on ep 0x83f620 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1975:udcm_process_messages] WC: wr_id: 0x0000000536870914, status: 0, opcode: 0x80, byte_len: 60, imm_data: 0x00000000, qp_num: 0x0001fb48, src_qp: 0x0001fb47, wc_flags: 0x0, slid: 0x0003 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2020:udcm_process_messages] received message. type: 102, lcl_ep = 0x83f620, rem_ep = 0x8af020, src qpn = 129863, length = 96, local buffer # = 2 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1771:udcm_send_ack] sending ack for message 0x973500 on ep 0x83f620 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1975:udcm_process_messages] WC: wr_id: 0x0000000536870913, status: 0, opcode: 0x80, byte_len: 60, imm_data: 0x00000000, qp_num: 0x0001fb47, src_qp: 0x0001fb48, wc_flags: 0x0, slid: 0x0003 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1788:udcm_handle_ack] got ack for message 0x973310 from slid 0x0003 qp 0x0001fb48 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1796:udcm_handle_ack] found matching message [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1975:udcm_process_messages] WC: wr_id: 0x0000000536870914, status: 0, opcode: 0x80, byte_len: 60, imm_data: 0x00000000, qp_num: 0x0001fb47, src_qp: 0x0001fb48, wc_flags: 0x0, slid: 0x0003 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1788:udcm_handle_ack] got ack for message 0x973500 from slid 0x0003 qp 0x0001fb48 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1796:udcm_handle_ack] found matching message [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2174:udcm_message_callback] running message thread [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2635:udcm_xrc_recv_qp_connect] Connecting Recv QP [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2640:udcm_xrc_recv_qp_connect] Failed to register qp_num: 129875, get error: Invalid argument (22) . Replying with RNR [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2980:udcm_xrc_handle_xconnect] rejecting request for reason -3 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:2305:udcm_set_message_timeout] activating timeout for message 0x983380 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1670:udcm_new_message] created message 0x983380 with type 102 [jenkins03:02158] [[13077,0],0]:errmgr_default_hnp.c(549) updating exit status to -1 [jenkins03:02158] [[13077,0],0] orted:comm:process_commands() Processing Command: ORTE_DAEMON_EXIT_CMD [jenkins03:02158] [[13077,0],0] orted_cmd: received exit cmd [jenkins03:02158] [[13077,0],0] orted_cmd: exit cmd, but proc [[13077,1],0] is alive [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1975:udcm_process_messages] WC: wr_id: 0x0000000536870915, status: 0, opcode: 0x80, byte_len: 60, imm_data: 0x00000000, qp_num: 0x0001fb47, src_qp: 0x0001fb48, wc_flags: 0x0, slid: 0x0003 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:2020:udcm_process_messages] received message. type: 102, lcl_ep = 0x8af020, rem_ep = 0x83f620, src qpn = 129864, length = 96, local buffer # = 3 [jenkins03][[13077,1],0][connect/btl_openib_connect_udcm.c:1771:udcm_send_ack] sending ack for message 0x983380 on ep 0x8af020 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1975:udcm_process_messages] WC: wr_id: 0x0000000536870915, status: 0, opcode: 0x80, byte_len: 60, imm_data: 0x00000000, qp_num: 0x0001fb48, src_qp: 0x0001fb47, wc_flags: 0x0, slid: 0x0003 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1788:udcm_handle_ack] got ack for message 0x983380 from slid 0x0003 qp 0x0001fb47 [jenkins03][[13077,1],1][connect/btl_openib_connect_udcm.c:1796:udcm_handle_ack] found matching message