Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update normative references to ISO/IEC 10646 #1

Closed
tahonermann opened this issue Apr 23, 2018 · 18 comments
Closed

Update normative references to ISO/IEC 10646 #1

tahonermann opened this issue Apr 23, 2018 · 18 comments
Assignees
Labels
enhancement New feature or request

Comments

@tahonermann
Copy link
Member

The C++ standard currently references ISO/IEC 10646-1:1993. That is, um, old.

There is, shall we say, an opportunity to improve this.

@tahonermann tahonermann added the enhancement New feature or request label Apr 23, 2018
@cubbimew
Copy link

cubbimew commented Apr 25, 2018

That's p0417r1 and it was rejected in Kona 2017 (US 64/CA 9/GB 4 issue proceeding)

@steve-downey
Copy link
Collaborator

[locale.stdcvt] is now deprecated, so not changing it is probably OK.

@steve-downey
Copy link
Collaborator

Also, looking into history, it seems that updating the reference got mixed together with deprecating UCS. It's really an independent thing.

@rmartinho
Copy link
Collaborator

rmartinho commented Apr 25, 2018

It's not really an independent thing, is it? The standard cannot use the term "UCS-2" without a reference to 10646 from before 2011 because it doesn't have a normative definition since.

Updating the edition of 10646 while keeping the UCS-2 references in the standard would just introduce the same problem we're trying to solve (namely, that "UTF-8", "UTF-16", and "UTF-32" are used without a normative definition)

@tahonermann
Copy link
Member Author

tahonermann commented Apr 25, 2018

That's p0417r1 and it was rejected in Kona 2017 (US 64/CA 9/GB 4 issue proceeding)

Thanks for that link. I did a little archaeology...

The GB 4, US 64, and CA 9 comments are from P0488:

Relevant meeting notes from the wiki:

I can't find mention of P0417 in any of the straw polls for Issaquah, Kona, or Toronto. It looks like CWG approved P0417R1, but it never got moved at plenary.

@steve-downey
Copy link
Collaborator

http://wiki.edg.com/bin/view/Wg21kona2017/LWGTuesdayNight has the brief discussion where p0417 was not adopted.

@cubbimew
Copy link

cubbimew commented Apr 25, 2018

@tahonermann
Copy link
Member Author

Ah, thanks. I managed to forget that TWiki actually has search capabilities...

Also http://wiki.edg.com/bin/view/Wg21kona2017/LWGCommentStatus

@rmartinho
Copy link
Collaborator

Damn, no wiki access.

Regarding the comments...

The present references to UCS2 in the Committee Draft are appropriate in the interests of preventing silent breakage of software written to older versions of C++.

I wonder how much software that will use C++20 relies on UCS-2 in a way that makes it not replaceable with UTF-16. Regardless, the reference to UCS-2 cannot stay, because it's simply not a thing. We could keep the behaviour by rewording those to use UTF-16 but failing when converting non-BMP characters to it.

@tahonermann
Copy link
Member Author

Damn, no wiki access.

You will be granted access at Rapperswil 😄

I think we should propose removing http://eel.is/c++draft/depr.locale.stdcvt#req in C++20.

@cubbimew
Copy link

Debian Code Search finds quite a few users of those facets: codecvt_utf8 codecvt_utf16 codecvt_utf8_utf16 - they will be caught without migration path if there is no replacement.

The way I see it, the issue is that two of those facets are specified to convert N:1, and on the platform that has 16-bit wchar_t, they have no other option. Making all three N:M, as p0417r1 proposed, would've solved the issue (as they are not used with fstream, so they have no need to be N:1 in the first place)

@cubbimew
Copy link

As mentioned at telecom, p0417r1 could probably be made more agreeable by changing "UCS2" to "implementation-defined", to keep the existing implementation divergence while unblocking the ISO 10646 version bump.

@tahonermann
Copy link
Member Author

Debian Code Search finds quite a few users of those facets: codecvt_utf8 codecvt_utf16 codecvt_utf8_utf16 - they will be caught without migration path if there is no replacement.

It looks like almost all of those hits are in the gcc or libstdc++ implementation and testsuite for those facets. I only found two actual uses in the few minutes I spent browsing the hits.

@cubbimew
Copy link

cubbimew commented Apr 28, 2018

It looks like almost all of those hits are in the gcc or libstdc++ implementation

grouping by package helps:
the codecvt_utf8 search (which includes codecvt_utf8_utf16 because I didn't write a regex carefully), skipping gcc/libc++/libstdc++, one usage from each package:
https://sources.debian.org/src/mysql-workbench/6.3.10+dfsg-2/library/base/string_utilities.cpp/?hl=282#L282
https://sources.debian.org/src/mapnik/3.0.11+ds-1/benchmark/test_utf_encoding.cpp/?hl=29#L29
https://sources.debian.org/src/libsass/3.4.8-1/src/file.cpp/?hl=33#L33
https://sources.debian.org/src/hunspell/1.6.2-1/src/hunspell2/string_utils.hxx/?hl=37#L37
https://sources.debian.org/src/libopenmpt/0.3.8-1/common/mptString.cpp/?hl=827#L827
https://sources.debian.org/src/llvm-toolchain-5.0/1:5.0.1-4/lldb/source/Host/common/Editline.cpp/?hl=1371#L1371
https://sources.debian.org/src/asio/1:1.10.8-1/include/asio/detail/winrt_utils.hpp/?hl=55#L55
https://sources.debian.org/src/android-framework-23/6.0.1+r72-4/frameworks/base/libs/androidfw/tests/ResTable_test.cpp/?hl=213#L213
https://sources.debian.org/src/extractpdfmark/1.0.2-2/src/utf8.cc/?hl=37#L37
https://sources.debian.org/src/hexchat/2.14.1-2/src/fe-gtk/notifications/notification-winrt.cpp/?hl=45#L45
https://sources.debian.org/src/dcmtk/3.6.2-3/oflog/libsrc/property.cc/?hl=140#L140
https://sources.debian.org/src/vowpal-wabbit/8.5.0.dfsg1-2/vowpalwabbit/vwdll.cpp/?hl=36#L36
https://sources.debian.org/src/aegisub/3.2.2+dfsg-3/build/freetype2/ftsystem.cpp/?hl=42#L42
https://sources.debian.org/src/openjfx/8u161-b12-1/modules/fxpackager/src/main/native/library/common/GenericPlatform.cpp/?hl=142#L142
https://sources.debian.org/src/sleuthkit/4.6.0-1/rejistry++/src/Rejistry.cpp/?hl=244#L244 <- wow, they imbue wcout with codecvt_utf8_utf16
https://sources.debian.org/src/qbs/1.11.0+dfsg-1/src/lib/corelib/tools/iosutils.h/?hl=85#L85
https://sources.debian.org/src/innoextract/1.6-1/src/util/windows.cpp/?hl=48#L48
https://sources.debian.org/src/bibledit/5.0.482-2/filter/string.cpp/?hl=1375#L1375
https://sources.debian.org/src/gjs/1.52.2-1/gjs/jsapi-util.cpp/?hl=857#L857
https://sources.debian.org/src/qtcreator/4.6.0-3/src/shared/qbs/src/lib/corelib/tools/iosutils.h/?hl=85#L85
https://sources.debian.org/src/libreoffice/1:6.0.4~rc1-4/desktop/source/app/sofficemain.cxx/?hl=95#L95
https://sources.debian.org/src/pbdagcon/0.3+20161121+ds-1/src/cpp/third-party/easylogging++.h/?hl=1093#L1093
https://sources.debian.org/src/simgear/1:2018.1.1+dfsg-1/simgear/misc/strutils.cxx/?hl=659#L659
https://sources.debian.org/src/nss/2:3.36.1-1/nss/gtests/softoken_gtest/softoken_gtest.cc/?hl=76#L76

I like how some of them are #ifdef'd to be Windows-only because MSVC had <codecvt> back in 2010

@tahonermann
Copy link
Member Author

grouping by package helps:

Indeed it does, thanks! We do, of course, need to provide replacement functionality for these facets at some point, but I don't think we'll be ready to do that in C++20. So the question is, is the existing usage significant enough to require a replacement prior to removal? These results suggest that may be the case. The proposal to remove these facets should probably include this data.

@rmartinho
Copy link
Collaborator

So we voted P1025R1 into the draft at Rapperswil. Should we close this? I think so, and then perhaps open a new issue for "Smuggle Unicode Standard normative reference".

@tahonermann
Copy link
Member Author

Let's wait to close this until we see the updates appear in the WP that appears in the post-meeting mailing. We can then ensure the expected updates were correctly applied.

As for adding a normative Unicode standard reference, https://github.com/rmartinho/sg16/blob/master/papers/d1097r0.md already does that, so can we rely on #15 to track doing so? The CWG feedback seems to be don't add it until we need it, so tracking it as a separate item doesn't seem so useful. I'm not too worried about forgetting to add it (when adding features that will need it).

@tahonermann
Copy link
Member Author

Closing this. I verified that the wording updates from P1025R1 appear in the current standard draft here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

4 participants