Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<bitset>: count() should use the same approach as std::popcount #667

Closed
AlexGuteniev opened this issue Apr 1, 2020 · 7 comments · Fixed by #2201
Closed

<bitset>: count() should use the same approach as std::popcount #667

AlexGuteniev opened this issue Apr 1, 2020 · 7 comments · Fixed by #2201
Labels
fixed Something works now, yay! performance Must go faster

Comments

@AlexGuteniev
Copy link
Contributor

AlexGuteniev commented Apr 1, 2020

bitset<N>::count()

STL/stl/inc/bitset

Lines 341 to 366 in 8e8453c

_NODISCARD size_t count() const noexcept { // count number of set bits
const char* const _Bitsperbyte = "\0\1\1\2\1\2\2\3\1\2\2\3\2\3\3\4"
"\1\2\2\3\2\3\3\4\2\3\3\4\3\4\4\5"
"\1\2\2\3\2\3\3\4\2\3\3\4\3\4\4\5"
"\2\3\3\4\3\4\4\5\3\4\4\5\4\5\5\6"
"\1\2\2\3\2\3\3\4\2\3\3\4\3\4\4\5"
"\2\3\3\4\3\4\4\5\3\4\4\5\4\5\5\6"
"\2\3\3\4\3\4\4\5\3\4\4\5\4\5\5\6"
"\3\4\4\5\4\5\5\6\4\5\5\6\5\6\6\7"
"\1\2\2\3\2\3\3\4\2\3\3\4\3\4\4\5"
"\2\3\3\4\3\4\4\5\3\4\4\5\4\5\5\6"
"\2\3\3\4\3\4\4\5\3\4\4\5\4\5\5\6"
"\3\4\4\5\4\5\5\6\4\5\5\6\5\6\6\7"
"\2\3\3\4\3\4\4\5\3\4\4\5\4\5\5\6"
"\3\4\4\5\4\5\5\6\4\5\5\6\5\6\6\7"
"\3\4\4\5\4\5\5\6\4\5\5\6\5\6\6\7"
"\4\5\5\6\5\6\6\7\5\6\6\7\6\7\7\x8";
const unsigned char* _Ptr = &reinterpret_cast<const unsigned char&>(_Array);
const unsigned char* const _End = _Ptr + sizeof(_Array);
size_t _Val = 0;
for (; _Ptr != _End; ++_Ptr) {
_Val += _Bitsperbyte[*_Ptr];
}
return _Val;
}

should use the same compiler magic: as <bit> popcount(x) when it is available:

STL/stl/inc/bit

Lines 143 to 150 in 8e8453c

template <class _Ty, enable_if_t<_Is_standard_unsigned_integer<_Ty>, int> _Enabled = 0>
_NODISCARD constexpr int popcount(const _Ty _Val) noexcept {
if constexpr (sizeof(_Ty) <= sizeof(unsigned int)) {
return __builtin_popcount(_Val);
} else {
return __builtin_popcountll(_Val);
}
}

Blocked by the same as #313 .

Disabled here:

STL/stl/inc/yvals_core.h

Lines 1091 to 1096 in 8e8453c

#if defined(__clang__) || defined(__EDG__)
#define __cpp_lib_bitops 201907L
#else // ^^^ Clang and EDG / MSVC vvv
// a future MSVC update will embed CPU feature detection into <bit> intrinsics
// TRANSITION, VSO-1020212
#endif // defined(__clang__) || defined(__EDG__)

Also tracked by DevCom-967643 and Microsoft-internal VSO-1090007 / AB#1090007.

@miscco
Copy link
Contributor

miscco commented Apr 1, 2020

You can directly reference the relevant code here in the issue. That helps a lot as one does not have to search for the code first

@AlexGuteniev
Copy link
Contributor Author

updated

@AlexGuteniev
Copy link
Contributor Author

AlexGuteniev commented Apr 1, 2020

DevCom-967643 - independently reported.
Well, I can imagine that compiler can optimize std::bitset<N>::count() without builtin, by just figuring out what this loop does :-) (After all, swapping bytes of a dword by shifts gets replaced by bswap)

@CaseyCarter CaseyCarter added the performance Must go faster label Apr 1, 2020
@CaseyCarter CaseyCarter changed the title <bitset>: count() performance - should use the same approach as <bit> popcount <bitset>: count() should use the same approach as std::popcount Apr 1, 2020
@barcharcraz
Copy link
Member

This is dependent on a rework of std::popcount :D.

Now that we have is_constant_evaluated I'm working on using that instead of the builtins (which are hard to implement correctly in MSVC). Once that happens then maybe bitset can use it. Although there could be problems since the constexpr versions of std::popcount will only work for uint32 and uint64 (we could probably figure out how to make them work for other numbers of bits, but it would be gnarly meta-programming)

@AlexGuteniev
Copy link
Contributor Author

which are hard to implement correctly in MSVC

I think builtin would be better here. I imagine that by seeing __builtin_popcount in a loop could emit vectored popcount for /arch:AVX512.
And if runtime CPU detection is problematic, then can __builtin_popcount depend on /arch option?

@StephanTLavavej
Copy link
Member

Although there could be problems since the constexpr versions of std::popcount will only work for uint32 and uint64 (we could probably figure out how to make them work for other numbers of bits, but it would be gnarly meta-programming)

Fortunately, uint32_t and uint64_t are all you need:

using _Ty = conditional_t<_Bits <= sizeof(unsigned long) * CHAR_BIT, unsigned long, unsigned long long>;

@barcharcraz
Copy link
Member

which are hard to implement correctly in MSVC

I think builtin would be better here. I imagine that by seeing __builtin_popcount in a loop could emit vectored popcount for /arch:AVX512.
And if runtime CPU detection is problematic, then can __builtin_popcount depend on /arch option?

I suppose one could implement vectored instructions, but the folks working on the compiler have better things to do (optimizations that would help more people).

__builtin_popcount and friends have behavior that is not implemented by any other intrinsic my msvc_bit_wip branch has my progress and I hope to make that into a pull request this week.

AlexGuteniev added a commit to AlexGuteniev/STL that referenced this issue Aug 15, 2021
Similar to `all()` method
AlexGuteniev added a commit to AlexGuteniev/STL that referenced this issue Sep 12, 2021
@StephanTLavavej StephanTLavavej added the fixed Something works now, yay! label Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed Something works now, yay! performance Must go faster
Projects
None yet
5 participants