-
Notifications
You must be signed in to change notification settings - Fork 1.5k
make bitset use popcount with ISA detection out of loop #2201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits - this mostly LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the structure of this code, but, after much headscratching I can't think of a better way.
So, approved
I'm mirroring this to an MSVC-internal PR. Please notify me if any further changes are pushed. |
Thanks for investigating this and finding a way to significantly improve |
Resolve #667
Moving dispatch out of loop does not rely on compiler optimization.
Rewrite of #2126 PR. The original PR had an issue that it relied on compiler optimization to move the check out of loop, which did not happen always.
To make sure the machinery is kept in
<limits>
, the callback is used. To make sure the functions are always inlined into the callback, they are wrapped into lambdas.Additionally, as @statementreply suggested, made 32-bit version of fallback using 32-bit types, and for 64-bit types making work by parts.
Benchmark
Results:
(fb stands for fallback implementation when
__isa_available
is set to zero)x64
x86