Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made fixes to AVX3 detection on macOS #2083

Merged

Conversation

johnplatts
Copy link
Contributor

Updated DetectTargets() to fix a bug that can occur on macOS 12.1 or earlier on x86_64 CPU's that have AVX3 support.

There is a bug on macOS 12.1 or earlier on x86_64 CPU's with AVX3 support where ZMM16-ZMM31, the upper 256 bits of ZMM0-ZMM15, and K0-K7 are not always properly preserved across context switches, and this bug is described at https://community.intel.com/t5/Software-Tuning-Performance/MacOS-Darwin-kernel-bug-clobbers-AVX-512-opmask-register-state/m-p/1327259, golang/go#49233, and simdutf/simdutf#236.

On macOS, it is also possible for bits 5, 6, and 7 of XCR0 to be zero on x86_64 CPU's that support AVX3 before the first AVX512 instruction is executed by the current thread as macOS only preserves ZMM16-ZMM31, the upper 256 bits of ZMM0-ZMM15, and K0-K7 across context switches on threads that have executed an AVX512 instruction.

DetectTargets() has been updated to check for AVX3 support on macOS by doing IsMacOs12_2OrLater() && HasCpuFeature("hw.optional.avx512f") instead of checking that bits 5, 6, and 7 of XCR0 are set to avoid false negative results and inconsistent behavior on x86_64 CPU's that support AVX3 on macOS and to disable AVX3 targets on macOS versions earlier than 12.2.

HasCpuFeature is a wrapper around the macOS sysctlbyname API function that returns true if a particular CPU feature is supported and returns false if a CPU feature is not supported.

Copy link
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @johnplatts for finding and fixing this issue :)

@copybara-service copybara-service bot merged commit 2dfbb40 into google:master Apr 12, 2024
32 of 33 checks passed
@johnplatts johnplatts deleted the hwy_macos_avx3_detect_fix_041124 branch May 1, 2024 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants