Made fixes to AVX3 detection on macOS #2083
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updated DetectTargets() to fix a bug that can occur on macOS 12.1 or earlier on x86_64 CPU's that have AVX3 support.
There is a bug on macOS 12.1 or earlier on x86_64 CPU's with AVX3 support where ZMM16-ZMM31, the upper 256 bits of ZMM0-ZMM15, and K0-K7 are not always properly preserved across context switches, and this bug is described at https://community.intel.com/t5/Software-Tuning-Performance/MacOS-Darwin-kernel-bug-clobbers-AVX-512-opmask-register-state/m-p/1327259, golang/go#49233, and simdutf/simdutf#236.
On macOS, it is also possible for bits 5, 6, and 7 of XCR0 to be zero on x86_64 CPU's that support AVX3 before the first AVX512 instruction is executed by the current thread as macOS only preserves ZMM16-ZMM31, the upper 256 bits of ZMM0-ZMM15, and K0-K7 across context switches on threads that have executed an AVX512 instruction.
DetectTargets() has been updated to check for AVX3 support on macOS by doing
IsMacOs12_2OrLater() && HasCpuFeature("hw.optional.avx512f")
instead of checking that bits 5, 6, and 7 of XCR0 are set to avoid false negative results and inconsistent behavior on x86_64 CPU's that support AVX3 on macOS and to disable AVX3 targets on macOS versions earlier than 12.2.HasCpuFeature is a wrapper around the macOS sysctlbyname API function that returns true if a particular CPU feature is supported and returns false if a CPU feature is not supported.