This version brings bug fixes and updates to our v2.0.0 release.
- [NA OFI]
- Choose addr format dynamically based on user preferences
- Add support for IPv6
- Add support for
FI_SOCKADDR_IB
- Add support for
FI_ADDR_STR
and shm provider - Add support for
FI_ADDR_OPX
and opx provider - Add support for HPE
cxi
provider, init info format forcxi
is:NIC:PID
(both or only one may be passed), NIC iscxi[0-9]
, PID is[0-510]
- Use
hwloc
to select interface to use if NIC information is available (only supported bycxi
at the moment) - Support device memory types and
FI_HMEM
forverbs
andcxi
providers - Update min required version to libfabric 1.9
- Improve debug output to print verbose FI info of selected provider
- [NA UCX]
- Use active messaging
UCP_FEATURE_AM
for unexpected messages (only), this allows for removal of address resolution and retry on first message to exchange connection IDs - Turn on mempool by default
- Support device memory types
- Bump min required version to 1.10
- Use active messaging
- [NA PSM]
- Add mercury NA plugin for the qlogic/intel PSM interface
- Also support PSM2 (Intel OmniPath) through the PSM NA plugin
- Add mercury NA plugin for the qlogic/intel PSM interface
- [NA]
- Add
na_addr_format
init info - Add
request_mem_device
init info when GPU support is requested - Update
NA_Mem_register()
API call to support memory types (e.g., CUDA, ROCm, ZE) and devices IDs - Add
na_loc
module forhwloc
detection - Remove
na_uint
,na_int
,na_bool_t
andna_size_t
types - Use separate versioning for library and update to v3.0.0
- Add
- [NA IP]
- Refactor
na_ip_check_interface()
to only usegetaddrinfo()
andgetifaddrs()
- Add family argument to force detection of IPv4/IPv6 addresses
- Add ip debug log
- Refactor
- [HG util]
- Add
mercury_byteswap.h
forbswap
macros - Add
mercury_inet.h
forhtonll
andntohll
routine - Add
mercury_param.h
to usesys/param.h
orMIN/MAX
macros etc - Use separate versioning for library and update to v3.0.0
- Add
- [HG bulk]
- Add support for memory attributes through a new
HG_Bulk_create_attr()
routine (support CUDA, ROCm, ZE)
- Add support for memory attributes through a new
- [HG]
- Remove
MERCURY_ENABLE_STATS
CMake option and use'diag'
log subsys instead- Modify behavior of
stats
field to turn on diagnostics - Refactor existing counters (used only if debug is on)
- Modify behavior of
- Add checksum levels that can be manually controlled at runtime (disabled by default,
HG_CHECKSUM_NONE
level) - Update to mchecksum v2.0
- Add
HG_Set_log_func()
andHG_Set_log_stream()
to control log output
- Remove
- [NA OFI]
- Switch
tcp
provider toFI_PROGRESS_MANUAL
- Prevent empty authorization keys from being passed
- Check max MR key used when
FI_MR_PROV_KEY
is not set - New implementation of address management
- Fix duplicate addresses on multithreaded lookups
- Redefine address keys and raw addresses to prevent allocations
- Use FI addr map to lookup by FI addr
- Improve serialization and deserialization of addresses
- Fix provider table and use EP proto
- Refactor and clean up plugin initialization
- Clean up ip and domain checking
- Ensure interface name is not used as domain name for verbs etc
- Use NA IP module and add missing
NA_OFI_VERIFY_PROV_DOM
fortcp
provider - Rework handling of
fi_info
to open fabric/domain/endpoint - Separate fabric from domain and keep single domain per NA class
- Refactor handling of scalable vs standard endpoints
- Improve handling of retries after
FI_EAGAIN
return code- Abort retried ops after default 90s timeout
- Abort ops to a target being retried after first
NA_HOSTUNREACH
error in CQ
- Switch
- [NA UCX]
- Fix potential error not returned correctly on
conn_insert()
- Fix potential double free of worker_addr
- Remove use of unified mode
- Ensure address key is correctly reset
- Fix hostname / net device parsing to allow for multiple net devices
- Fix potential error not returned correctly on
- [HG util]
- Make sure we round up ms time conversion, this ensures that small timeouts do not result in busy spin.
- Use sched_yield() instead of deprecated pthread_yield()
- Fix
'none'
log level not recognized - Fix external logging facility
- Let mercury log print counters on exit when debug outlet is on
- [HG proc]
- Prevent call to
save_ptr()/restore_ptr()
duringHG_FREE
- Prevent call to
- [HG Bulk]
- Remove some
NA_CANCELED
event warnings.
- Remove some
- [HG]
- Properly handle error when overflow bulk transfer is interrupted. Previously the RPC callback was triggered regarldless, potentially causing issues.
- [CMake]
- Correctly set INSTALL_RPATH for target libraries
- [NA OFI]
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
FI_UNIVERSE_SIZE
to be set.
- [tcp/verbs;ofi_rxm] Using more than 256 peers requires
- [NA UCX]
NA_Addr_to_string()
cannot be used on non-listening processes to convert a self-address to a string.