-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes Unicode paths on Windows #1363
Fixes Unicode paths on Windows #1363
Conversation
|
4f81806
to
d36f072
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @itsmattkc for this pull request to tackle the challenging file path character encoding difference between Windows and Linux platforms.
Even If I made several comments/suggestions, I think that you are on the right track.
Any changes in the library must now provide a unit test (a new one or updating an existing one) to demonstrate that the change works on all the supported platforms. Even if it could end-up to be a no-op for Linux platform, a unit test must validate it also.
Note: if the method DefaultComputeHash
can be moved in platform
then the methods Utf16ToUtf8()
and Utf8ToUtf16
could then be completely private i.e. no more define in an header file.
433ac99
to
1658bbf
Compare
Thanks @itsmattkc for all the efforts to fix the file path on the Windows platform. I do not understand the rational for the latest commit. Could you explain it? |
Hey, yes I've made a few more changes the PR overall and am still in the process of finalizing it. I implemented all of your suggestions but then thought it might be good to support Unicode environment variables too (such as the As for the latest commit, I assume you're referring to enforcing UTF-16LE over UTF-16BE? That's just because Windows specifically uses UTF-16LE as its Unicode encoding. When I implemented a test of the UTF-8/UTF-16 conversion functions (using pre-existing literals as "control variables"), I found it would fail in some environments because some platforms (notably the CentOS 7 CI set up here) defaulted to UTF-16BE which would lead to the strings failing to compare. Since these functions are specifically intended for Windows anyway, it seemed appropriate to just enforce UTF-16LE in the conversion. |
0da7c40
to
0d480fb
Compare
Okay, sorry for the delay, this PR is now ready to be re-reviewed. I've squashed my commits for neatness and updated the OP to better reflect the changes made. |
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work.
I only made comments to improve the readability of the code.
f91e774
to
cfbc71f
Compare
Unlike most other platforms, Windows' Unicode is standardized around UTF-16, an encoding not compatible with "char *" arrays common in C/C++. As such, to support Unicode correctly when using Win32 APIs, strings must be converted to and from UTF-16 and the Unicode versions of the APIs must be used over the ANSI versions. This commit introduces the following: - Utility funcions for converting between UTF-8 and UTF-16LE on all platforms: - Platform::Utf8ToUtf16 - Platform::Utf16ToUtf8 - Adds test for these conversion functions to ensure the conversion to and from UTF-8 and UTF-16LE is correct. - Utility wrappers for "std::ifstream" that automatically convert to and from UTF-16 so that filenames requiring Unicode encoding function correctly: - Platform::CreateInputFileStream - Platform::OpenInputFileStream - Moves the file default compute hash function to the Platform class (Platform::CreateFileContentHash) and switches to the Win32 UTF-16 variant on Windows. - Adds the "UNICODE" macro before including "Windows.h" which ensures all functions called are the Unicode variants instead of the default ANSI variants. - Implicitly changes functions such as GetEnvironmentVariable and SetEnvironmentVariable to their Unicode variants. - Changes the following environment variable related functions to their Win32 Unicode variants on Windows: - environ -> _wenviron - _putenv_s -> _wputenv_s - Updates tests using SetEnvironmentVariable to use wide string literals since that function has been switched to the Unicode variant. Signed-off-by: itsmattkc <itsmattkc@gmail.com>
cfbc71f
to
8c72711
Compare
Thanks for the review, I believe I have resolved all of your suggestions. Let me know if anything else should be changed. |
Windows projects usually define UNICODE in the build process, usually a Visual Studio project. By defining it in a header like we have here, we may force UNICODE on a Windows project that doesn't want it. As a library, I think we should respond to UNICODE, not set it. If we have something that explicitly needs the Unicode version, just call it directly, i.e. use SetWindowTextW() instead of SetWindowText(). UNICODE might need to be added to the CMake stuff somewhere. For Windows it might be on by default, but there should be a way to switch it off and everything should still work. |
|
I still think defining UNICODE in our own header is a mistake. It should be defined by the build system. It would be like defining |
EDIT: Never mind, I understand what you're saying. So it would make sense to define it in CMake rather than a header? |
I googled on that |
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
@itsmattkc There is a missing
|
Concerning the Windows |
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
Looks like I made a mistake in the CMake files, it should compile now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last questions I think by approving.
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these improvements! I made one minor suggestion, but approving now.
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work.
@itsmattkc please, update your pull request so I can merge it. |
* Fixes Unicode paths on Windows Unlike most other platforms, Windows' Unicode is standardized around UTF-16, an encoding not compatible with "char *" arrays common in C/C++. As such, to support Unicode correctly when using Win32 APIs, strings must be converted to and from UTF-16 and the Unicode versions of the APIs must be used over the ANSI versions. This commit introduces the following: - Utility funcions for converting between UTF-8 and UTF-16LE on all platforms: - Platform::Utf8ToUtf16 - Platform::Utf16ToUtf8 - Adds test for these conversion functions to ensure the conversion to and from UTF-8 and UTF-16LE is correct. - Utility wrappers for "std::ifstream" that automatically convert to and from UTF-16 so that filenames requiring Unicode encoding function correctly: - Platform::CreateInputFileStream - Platform::OpenInputFileStream - Moves the file default compute hash function to the Platform class (Platform::CreateFileContentHash) and switches to the Win32 UTF-16 variant on Windows. - Adds the "UNICODE" macro before including "Windows.h" which ensures all functions called are the Unicode variants instead of the default ANSI variants. - Implicitly changes functions such as GetEnvironmentVariable and SetEnvironmentVariable to their Unicode variants. - Changes the following environment variable related functions to their Win32 Unicode variants on Windows: - environ -> _wenviron - _putenv_s -> _wputenv_s - Updates tests using SetEnvironmentVariable to use wide string literals since that function has been switched to the Unicode variant. Signed-off-by: itsmattkc <itsmattkc@gmail.com> * Moved UNICODE and _UNICODE definitions to CMake Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> * handle empty strings and use const references Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> * only use wenviron when unicode is enabled Signed-off-by: itsmattkc <itsmattkc@gmail.com> * add ANSI Win32 version of Getenv Signed-off-by: itsmattkc <itsmattkc@gmail.com> * Add CMake option for compiling with Win32 Unicode support Signed-off-by: itsmattkc <itsmattkc@gmail.com> * Throw exception if UTF functions are called on non-Windows platforms Signed-off-by: itsmattkc <itsmattkc@gmail.com> * Throw OCIO Exception Signed-off-by: itsmattkc <itsmattkc@gmail.com> * Don't run UTF-8/16 conversion test on non-Windows Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> * Fix CMake error Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> * minor CMake adjustment Signed-off-by: itsmattkc <itsmattkc@gmail.com> * Clarified CMake option for Win32 unicode Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> * minor improvements Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> * fix macro in cpu test cmake Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* Fixes Unicode paths on Windows Unlike most other platforms, Windows' Unicode is standardized around UTF-16, an encoding not compatible with "char *" arrays common in C/C++. As such, to support Unicode correctly when using Win32 APIs, strings must be converted to and from UTF-16 and the Unicode versions of the APIs must be used over the ANSI versions. This commit introduces the following: - Utility funcions for converting between UTF-8 and UTF-16LE on all platforms: - Platform::Utf8ToUtf16 - Platform::Utf16ToUtf8 - Adds test for these conversion functions to ensure the conversion to and from UTF-8 and UTF-16LE is correct. - Utility wrappers for "std::ifstream" that automatically convert to and from UTF-16 so that filenames requiring Unicode encoding function correctly: - Platform::CreateInputFileStream - Platform::OpenInputFileStream - Moves the file default compute hash function to the Platform class (Platform::CreateFileContentHash) and switches to the Win32 UTF-16 variant on Windows. - Adds the "UNICODE" macro before including "Windows.h" which ensures all functions called are the Unicode variants instead of the default ANSI variants. - Implicitly changes functions such as GetEnvironmentVariable and SetEnvironmentVariable to their Unicode variants. - Changes the following environment variable related functions to their Win32 Unicode variants on Windows: - environ -> _wenviron - _putenv_s -> _wputenv_s - Updates tests using SetEnvironmentVariable to use wide string literals since that function has been switched to the Unicode variant. Signed-off-by: itsmattkc <itsmattkc@gmail.com> * Moved UNICODE and _UNICODE definitions to CMake Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> * handle empty strings and use const references Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> * only use wenviron when unicode is enabled Signed-off-by: itsmattkc <itsmattkc@gmail.com> * add ANSI Win32 version of Getenv Signed-off-by: itsmattkc <itsmattkc@gmail.com> * Add CMake option for compiling with Win32 Unicode support Signed-off-by: itsmattkc <itsmattkc@gmail.com> * Throw exception if UTF functions are called on non-Windows platforms Signed-off-by: itsmattkc <itsmattkc@gmail.com> * Throw OCIO Exception Signed-off-by: itsmattkc <itsmattkc@gmail.com> * Don't run UTF-8/16 conversion test on non-Windows Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> * Fix CMake error Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> * minor CMake adjustment Signed-off-by: itsmattkc <itsmattkc@gmail.com> * Clarified CMake option for Win32 unicode Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> * minor improvements Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> * fix macro in cpu test cmake Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com> Co-authored-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
Fixes #406.
Unlike most other platforms, Windows' Unicode is standardized around UTF-16, an encoding not compatible with
char *
arrays common in C/C++. As such, to support Unicode correctly when using Win32 APIs, strings must be converted to and from UTF-16 and the Unicode versions of the APIs must be used over the ANSI versions.This commit introduces the following:
Utility functions for converting between UTF-8 and UTF-16LE on all platforms:
Platform::Utf8ToUtf16
Platform::Utf16ToUtf8
std::codecvt_utf8_utf16
on macOS and Linux butMultiByteToWideChar
andWideCharToMultiByte
on Windows because MSVC gave deprecation warnings recommending the Win32 functions with the former.Test for these conversion functions to ensure the conversion to and from UTF-8 and UTF-16LE is correct.
Utility wrappers for "std::ifstream" that automatically convert to and from UTF-16 on Windows so that filenames requiring Unicode encoding function correctly:
Platform::CreateInputFileStream
Platform::OpenInputFileStream
Moves the file default compute hash function to the Platform class (
Platform::CreateFileContentHash
) and switches to the Win32 UTF-16 variant on Windows.Adds the
UNICODE
macro before including "Windows.h" on Windows which ensures all functions called are the Unicode variants instead of the default ANSI variants.GetEnvironmentVariable
andSetEnvironmentVariable
to their Unicode variants.Changes the following environment variable related functions to their Win32 Unicode variants on Windows:
environ
->_wenviron
_putenv_s
->_wputenv_s
Updates tests using
SetEnvironmentVariable
to use wide string literals since that function has been switched to the Unicode variant.In my testing, this appears to function correctly, loading configs, LUTs, and looks with non-ASCII paths, however I have not tested exhaustively. There may be more file-related functions I haven't found/wrapped.
Happy to take any suggestions or make any changes to the implementation upon request.