Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<xcharconv_ryu.h>: Fix __ryu_shiftright128 for ARM64 #4304

Merged
merged 1 commit into from
Jan 11, 2024

Conversation

StephanTLavavej
Copy link
Member

When I ported @ulfjack's amazing Ryu and Ryu Printf to MSVC <charconv> (making it compatible with MSVC and C++, adding buffer bounds checking, etc.), I extensively validated correctness and tried to improve performance for both x64 and x86, on both MSVC and Clang/LLVM. x64 had convenient, fast intrinsics, while x86 relied on plain old (bitwise) arithmetic. Life was simple.

Over time, our codebase was modified to support ARM, ARM64, and more exotic flavors like ARM64EC, but we've long lacked comprehensive runtime test coverage. Now that's beginning to change, and Andrew Dean from the MSVC compiler back-end team discovered runtime failures, indicated by an _STL_INTERNAL_CHECK in __ryu_shiftright128. If built without internal checks, the code would fail at runtime due to incorrect results, so it wasn't just an over-eager assertion.

I haven't totally tracked down the codebase archaeology, but it appears that as the code was altered to support ARM64, the original simplicity of "we have x64 intrinsics" vs. "we don't" was broken. In particular, _HAS_CHARCONV_INTRINSICS is enabled for both x64 and ARM64, despite ARM64 not having the full set of convenient intrinsics:

#if defined(_M_X64) || defined(_M_ARM64) || defined(_M_ARM64EC) || defined(_M_HYBRID_X86_ARM64)
#define _HAS_CHARCONV_INTRINSICS 1
#else // ^^^ intrinsics available / intrinsics unavailable vvv

Then we have a codepath that's guarded by _HAS_CHARCONV_INTRINSICS and (as the comment says) can pass __dist in the range [0, 52] to __ryu_shiftright128. It's sometimes exactly 0 here:

STL/stl/inc/xcharconv_ryu.h

Lines 390 to 393 in 87580ca

#if _HAS_CHARCONV_INTRINSICS
const uint32_t __dist = static_cast<uint32_t>(__j - 128); // __dist: [0, 52]
const uint64_t __shiftedhigh = __s1high >> __dist;
const uint64_t __shiftedlow = __ryu_shiftright128(__s1low, __s1high, __dist);

For x64 proper, __ryu_shiftright128 is powered by the __shiftright128 intrinsic which can handle anything in [0, 63] for sure:

STL/stl/inc/xcharconv_ryu.h

Lines 195 to 218 in 87580ca

_NODISCARD inline uint64_t __ryu_shiftright128(const uint64_t __lo, const uint64_t __hi, const uint32_t __dist) {
#if defined(_M_X64) && !defined(_M_ARM64EC)
// For the __shiftright128 intrinsic, the shift value is always
// modulo 64.
// In the current implementation of the double-precision version
// of Ryu, the shift value is always < 64.
// (The shift value is in the range [49, 58].)
// Check this here in case a future change requires larger shift
// values. In this case this function needs to be adjusted.
_STL_INTERNAL_CHECK(__dist < 64);
return __shiftright128(__lo, __hi, static_cast<unsigned char>(__dist));
#else // ^^^ defined(_M_X64) && !defined(_M_ARM64EC) / !defined(_M_X64) || defined(_M_ARM64EC) vvv
// We don't need to handle the case __dist >= 64 here (see above).
_STL_INTERNAL_CHECK(__dist < 64);
#if defined(_WIN64) || defined(_M_HYBRID_X86_ARM64)
_STL_INTERNAL_CHECK(__dist > 0);
return (__hi << (64 - __dist)) | (__lo >> __dist);
#else // ^^^ 64-bit or _M_HYBRID_X86_ARM64 / 32-bit vvv
// Avoid a 64-bit shift by taking advantage of the range of shift values.
_STL_INTERNAL_CHECK(__dist >= 32);
return (__hi << (64 - __dist)) | (static_cast<uint32_t>(__lo >> 32) >> (__dist - 32));
#endif // ^^^ 32-bit ^^^
#endif // defined(_M_X64) && !defined(_M_ARM64EC)
}

The comment in the x64 codepath, "(The shift value is in the range [49, 58].)" appears to be bogus given the callsite. (I think the comment may have been correct for Ryu in isolation, but Ryu Printf was added later.) We were just getting away with it because the x64 intrinsic handles the full range.

Then, the ARM64 codepath below has an assumption checked by _STL_INTERNAL_CHECK(__dist > 0), which was being triggered. When __dist == 0, the following code performed a "full shift" (i.e. trying to shift all 64 bits out at once), which famously/notoriously has undefined behavior (and ARM64's physical behavior makes this non-innocuous).

The "32-bit" codepath at the bottom has an even stronger assumption in the name of micro-optimization (__dist >= 32) that I had contributed in ulfjack/ryu#77. I think we were getting away with this one because _HAS_CHARCONV_INTRINSICS prevents small __dist values from being sent to it, but I am no longer confident of this given the exotic flavors we've accumulated (hybrid x86/ARM64 etc.).

My fix is to blast away the complexity we accumulated here. When the x64 intrinsic is unavailable, we fall back to plain code with a special case for zero-shifting. This is unquestionably correct for everything in the range [0, 63]. Note that this may slightly degrade performance for x86, but I simply no longer care about that anymore (it is less relevant every year, if anyone cares about perf they should be using x64, and in absolute terms Ryu and Ryu Printf are still incredibly fast, we're talking mere nanoseconds either way).

With this fix, I've verified that the following test directories (originally reported by Andrew Dean) are now passing at runtime on ARM64 (by temporarily removing "TESTS_BUILD_ONLY": true in CMakePresets.json):

  • tests/std/tests/P0067R5_charconv
  • tests/std/tests/P0645R10_text_formatting_formatting
  • llvm-project/libcxx/test/std/utilities/charconv/charconv.msvc

@StephanTLavavej StephanTLavavej added bug Something isn't working ARM64 Related to the ARM64 architecture labels Jan 10, 2024
@StephanTLavavej StephanTLavavej requested a review from a team as a code owner January 10, 2024 02:51
@CaseyCarter

This comment was marked as off-topic.

This comment was marked as resolved.

@microsoft microsoft deleted a comment from azure-pipelines bot Jan 10, 2024
@StephanTLavavej StephanTLavavej self-assigned this Jan 11, 2024
@StephanTLavavej
Copy link
Member Author

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 3eac329 into microsoft:main Jan 11, 2024
37 checks passed
@StephanTLavavej StephanTLavavej deleted the fix-arm64-charconv branch January 11, 2024 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARM64 Related to the ARM64 architecture bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants