Releases: Azure/azurehpc-health-checks
Releases · Azure/azurehpc-health-checks
AZ Health Checks v0.4.3
- CodeQL security scan
- Trigger Guest Health Reporting (GHR) automatically from NHC
- Status for HCv4, NCv4, NDv2, ND96amsr v4, and ND96isr H100 v5 from the validation pipeline
- Support AMD MI200/300 SKUs for conf files
- Add RCCL all reduce tests
- Log pkeys and kernel version
- Add NCCL all reduce tests
- Bug fixes
AZ Health Checks v0.4.2
What's Changed
- Docker image now uses MCR
- HC44 now supported
- Bug Fixes
AZ Health Checks v0.4.1
What's Changed
- Introduction of Docker NHC
- Increased visibility into test results
- Increased logging
- Bug fixes
AZ Health Checks v0.2.9
- Adding additional logging and Kusto functionality
- Bug fixes
AZ Health Checks v0.4.0
This is a prerelease of the docker version of AzNHC.
The functionality remains the same. The major differences are the following:
- No installation needed
- Docker is now a prerequisite
- The only set up is pulling the docker image
The run script and checks behave the same way.
AZ Health Checks v0.2.8
What's Changed
- Adding AMD GPU SKU support
- Add feature to extend tests by adding a secondary conf file
- Bug Fixes
AZ Health Checks v0.2.7
What's Changed
- NCv3, NCv4, NCv5, NDv2 support
- Hbv3 smaller sizes support
- NvBandwidth tool from Nvidia added to measure Nvlink and GPU BW
- Refresh install script.
AZ Health Checks v0.2.6
What's Changed
- Bug fixes
- Refactoring of IB write tests
- NDv5 SKU IB tests no longer communicate between devices. The HCA device loop back to themselves to prevent IB traffic from leaving the node
- Renaming tests for clarity
- Documentation update
Az Health Checks v0.2.5
Whats Changed:
- Distributed NHC functionality
- Adds ability to launch NHC in a distributed fashion
- Adds ability to launch NHC with Slurm
- Changes to support Cycle cloud usage
- Accelerated network checks addition
New Contributors
Full Changelog: v0.2.4...v.0.2.5
Az Health Checks v0.2.4
What's Changed
- Bug fixes