Skip to content

Releases: Azure/azurehpc-health-checks

AZ Health Checks v0.4.3

24 Oct 20:06
bc06466
Compare
Choose a tag to compare
  • CodeQL security scan
  • Trigger Guest Health Reporting (GHR) automatically from NHC
  • Status for HCv4, NCv4, NDv2, ND96amsr v4, and ND96isr H100 v5 from the validation pipeline
  • Support AMD MI200/300 SKUs for conf files
  • Add RCCL all reduce tests
  • Log pkeys and kernel version
  • Add NCCL all reduce tests
  • Bug fixes

AZ Health Checks v0.4.2

25 Apr 18:33
1d85b57
Compare
Choose a tag to compare

What's Changed

  • Docker image now uses MCR
  • HC44 now supported
  • Bug Fixes

AZ Health Checks v0.4.1

18 Apr 15:32
0078044
Compare
Choose a tag to compare

What's Changed

  • Introduction of Docker NHC
  • Increased visibility into test results
  • Increased logging
  • Bug fixes

AZ Health Checks v0.2.9

14 Mar 21:25
fb7a85d
Compare
Choose a tag to compare
  • Adding additional logging and Kusto functionality
  • Bug fixes

AZ Health Checks v0.4.0

26 Feb 17:00
48aba67
Compare
Choose a tag to compare
Pre-release

This is a prerelease of the docker version of AzNHC.

The functionality remains the same. The major differences are the following:

  • No installation needed
  • Docker is now a prerequisite
  • The only set up is pulling the docker image

The run script and checks behave the same way.

AZ Health Checks v0.2.8

06 Feb 17:35
7ad4218
Compare
Choose a tag to compare

What's Changed

  • Adding AMD GPU SKU support
  • Add feature to extend tests by adding a secondary conf file
  • Bug Fixes

AZ Health Checks v0.2.7

10 Jan 20:51
3d7a6bd
Compare
Choose a tag to compare

What's Changed

  • NCv3, NCv4, NCv5, NDv2 support
  • Hbv3 smaller sizes support
  • NvBandwidth tool from Nvidia added to measure Nvlink and GPU BW
  • Refresh install script.

AZ Health Checks v0.2.6

07 Nov 19:15
3602c5a
Compare
Choose a tag to compare

What's Changed

  • Bug fixes
  • Refactoring of IB write tests
    • NDv5 SKU IB tests no longer communicate between devices. The HCA device loop back to themselves to prevent IB traffic from leaving the node
    • Renaming tests for clarity
  • Documentation update

Az Health Checks v0.2.5

19 Sep 21:16
e269c0b
Compare
Choose a tag to compare

Whats Changed:

  • Distributed NHC functionality
    • Adds ability to launch NHC in a distributed fashion
    • Adds ability to launch NHC with Slurm
  • Changes to support Cycle cloud usage
  • Accelerated network checks addition

New Contributors

Full Changelog: v0.2.4...v.0.2.5

Az Health Checks v0.2.4

28 Jul 15:07
fd90e17
Compare
Choose a tag to compare

What's Changed

  • Bug fixes