-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<bit>: consider leaving only static CPU dispatch for lzcnt/bsr and tzcnt/bsr #2133
Comments
Actually not much better. MSVC optimizer is far from perfect for this case |
@barcharcraz notes that these functions are typically called in perf-critical loops, so micro-optimizations can be important. Additionally, I note that as long as we don't regress MSVC codegen quality, changes could be worthwhile if they improve Clang/LLVM codegen quality. |
Tried to benchmark what is better and got controversial results. |
The Lines 149 to 176 in 3c2fd04
Is branchy, and avoiding it seem to be a worth thing. The Lines 1092 to 1124 in 3c2fd04
Is branchless ( cmov is used), and avoiding it seem to be not really useful.
|
We'd like to see the specific perf data before making a decision here. |
This comment has been minimized.
This comment has been minimized.
@StephanTLavavej , I figured it out! So overall the advantage of lzcnt/tzcnt worth a branch with fallback, so the implementation is optimal as it is! All animalities I see are explained by Intel JCC Errata. Compiling with I've created DevCom-1603517 to make Otherwise the data suggests that we should do dynamic dispatch, the occurrences where static-only dispatch wins are purely random. I'm closing this. |
Great work, @AlexGuteniev - thanks for digging into this. |
Inspired by #2097 .
Try this code:
STL/stl/inc/format
Lines 2496 to 2504 in 737ce4a
On Godbolt's compiler explorer: https://godbolt.org/z/z99vjeEhh
The text was updated successfully, but these errors were encountered: