forked from easybuilders/easybuild-easyconfigs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request easybuilders#19231 from Flamefire/20231114131104_n…
…ew_pr_NCCL2103 fix possible error/crash in NCCL on x86 due to cpuid
- Loading branch information
Showing
10 changed files
with
93 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
The 2nd CPUID asm code modifies registers used by other variables due to failure to list | ||
EBX, ECX & EDX in the "clobbers" list. | ||
This causes corruption leading to segfaults or wrong results depending on compiler optimization/register allocation. | ||
|
||
Fix by using the __cpuid GCC function. | ||
See https://github.com/NVIDIA/nccl/pull/1070 | ||
|
||
Author: Alexander Grund (TU Dresden) | ||
|
||
diff --git a/src/graph/xml.cc b/src/graph/xml.cc | ||
index 316d20f..d0d1272 100644 | ||
--- a/src/graph/xml.cc | ||
+++ b/src/graph/xml.cc | ||
@@ -12,6 +12,9 @@ | ||
#include "core.h" | ||
#include "nvmlwrap.h" | ||
#include "xml.h" | ||
+#if defined(__x86_64__) | ||
+#include <cpuid.h> | ||
+#endif | ||
|
||
/*******************/ | ||
/* XML File Parser */ | ||
@@ -408,7 +411,8 @@ ncclResult_t ncclTopoGetXmlFromCpu(struct ncclXmlNode* cpuNode, struct ncclXml* | ||
char vendor[12]; | ||
} cpuid0; | ||
|
||
- asm volatile("cpuid" : "=b" (cpuid0.ebx), "=c" (cpuid0.ecx), "=d" (cpuid0.edx) : "a" (0) : "memory"); | ||
+ unsigned unused; | ||
+ __cpuid(0, unused, cpuid0.ebx, cpuid0.ecx, cpuid0.edx); | ||
char vendor[13]; | ||
strncpy(vendor, cpuid0.vendor, 12); | ||
vendor[12] = '\0'; | ||
@@ -430,7 +434,8 @@ ncclResult_t ncclTopoGetXmlFromCpu(struct ncclXmlNode* cpuNode, struct ncclXml* | ||
}; | ||
uint32_t val; | ||
} cpuid1; | ||
- asm volatile("cpuid" : "=a" (cpuid1.val) : "a" (1) : "memory"); | ||
+ unsigned unused; | ||
+ __cpuid(1, cpuid1.val, unused, unused, unused); | ||
int familyId = cpuid1.familyId + (cpuid1.extFamilyId << 4); | ||
int modelId = cpuid1.modelId + (cpuid1.extModelId << 4); | ||
NCCLCHECK(xmlSetAttrInt(cpuNode, "familyid", familyId)); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters