-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<format>: Assume UTF-8 format strings when execution charset is UTF-8 #1824
<format>: Assume UTF-8 format strings when execution charset is UTF-8 #1824
Conversation
(void) format("{:\x9f\x8f\x88<10}"sv, 42); // Bad fill character encoding: missing lead byte before \x9f | ||
assert(false); | ||
} catch (const format_error&) { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could check the error message here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eagh, the value of checking the error messages is very low.
stl/inc/format
Outdated
#pragma warning(pop) | ||
}(); | ||
|
||
_NODISCARD inline int _Utf8_code_units_in_next_character(const char* const _First, const char* const _Last) noexcept { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what I see nothing in this function prevents it from being constexpr
However, I believe it only makes sense at runtime. So should we add a comment that this is intentionally not constexpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it constexpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my inexperienced point of view this looks good. I would like to cleanup _Code_units_in_next_character
so that it only defers to subfunctions but this is purely a style thing which can / should be disregarded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - I'll push changes for my minor comments here, and a couple more to mitigate merge conflicts with the commits I recently pushed to merging_format
.
When execution charset is UTF-8, assume that format strings are encoded in UTF-8, not in the active code page.
This PR only attempts to detect UTF-8.
Fixes #1820.