Skip to content

Commit

Permalink
Fixes Unicode paths on Windows
Browse files Browse the repository at this point in the history
Unlike most other platforms, Windows' Unicode is standardized around
UTF-16, an encoding not compatible with "char *" arrays common in
C/C++. As such, to support Unicode correctly when using Win32 APIs,
strings must be converted to and from UTF-16 and the Unicode
versions of the APIs must be used over the ANSI versions.

This commit introduces the following:

- Utility funcions for converting between UTF-8 and UTF-16LE on all
  platforms:
  - Platform::Utf8ToUtf16
  - Platform::Utf16ToUtf8

- Adds test for these conversion functions to ensure the conversion
  to and from UTF-8 and UTF-16LE is correct.

- Utility wrappers for "std::ifstream" that automatically convert
  to and from UTF-16 so that filenames requiring Unicode encoding
  function correctly:
  - Platform::CreateInputFileStream
  - Platform::OpenInputFileStream

- Moves the file default compute hash function to the Platform
  class (Platform::CreateFileContentHash) and switches to the
  Win32 UTF-16 variant on Windows.

- Adds the "UNICODE" macro before including "Windows.h" which
  ensures all functions called are the Unicode variants instead of
  the default ANSI variants.
  - Implicitly changes functions such as GetEnvironmentVariable and
    SetEnvironmentVariable to their Unicode variants.

- Changes the following environment variable related functions to
  their Win32 Unicode variants on Windows:
  - environ -> _wenviron
  - _putenv_s -> _wputenv_s

- Updates tests using SetEnvironmentVariable to use wide string
  literals since that function has been switched to the Unicode
  variant.

Signed-off-by: itsmattkc <itsmattkc@gmail.com>
  • Loading branch information
itsmattkc committed Apr 22, 2021
1 parent df1bf14 commit cfbc71f
Show file tree
Hide file tree
Showing 10 changed files with 254 additions and 73 deletions.
2 changes: 1 addition & 1 deletion src/OpenColorIO/Config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1180,7 +1180,7 @@ ConstConfigRcPtr Config::CreateFromEnv()

ConstConfigRcPtr Config::CreateFromFile(const char * filename)
{
std::ifstream istream(filename);
std::ifstream istream = Platform::CreateInputFileStream(filename, std::ios_base::in);
if (istream.fail())
{
std::ostringstream os;
Expand Down
25 changes: 25 additions & 0 deletions src/OpenColorIO/ContextVariableUtils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

#include "ContextVariableUtils.h"
#include "utils/StringUtils.h"
#include "Platform.h"


#if defined(__APPLE__) && !defined(__IPHONE__)
Expand All @@ -19,6 +20,12 @@ extern char ** environ;
namespace
{

#ifdef _WIN32
inline wchar_t ** GetEnviron()
{
return _wenviron;
}
#else
inline char ** GetEnviron()
{
#if __IPHONE__
Expand All @@ -30,6 +37,7 @@ inline char ** GetEnviron()
return environ;
#endif
}
#endif

} // anon.

Expand Down Expand Up @@ -71,11 +79,28 @@ void LoadEnvironment(EnvMap & map, bool update)
{
// First, add or update the context variables with existing env. variables.

#ifdef _WIN32
if (GetEnviron() == NULL) {
// If the program starts with "main" instead of "wmain", then wenviron returns NULL until
// the first call to either wgetenv or wputenv. Calling wgetenv, even with an empty
// variable name, will populate wenviron correctly. We also use wgetenv_s (which requires
// a valid size pointer) to suppress safety warnings about wgetenv during the compile.
size_t sz;
_wgetenv_s(&sz, NULL, 0, L"");
}

for (wchar_t **env = GetEnviron(); *env != NULL; ++env)
{
// Split environment up into std::map[name] = value.

const std::string env_str = Platform::Utf16ToUtf8((wchar_t*)*env);
#else
for (char **env = GetEnviron(); *env != NULL; ++env)
{
// Split environment up into std::map[name] = value.

const std::string env_str = (char*)*env;
#endif
const int pos = static_cast<int>(env_str.find_first_of('='));

const std::string name = env_str.substr(0, pos);
Expand Down
36 changes: 3 additions & 33 deletions src/OpenColorIO/PathUtils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@

#include <iostream>
#include <map>
#include <sys/stat.h>

#include <OpenColorIO/OpenColorIO.h>

#include "Mutex.h"
#include "PathUtils.h"
#include "Platform.h"
#include "pystring/pystring.h"
#include "utils/StringUtils.h"

Expand All @@ -26,39 +26,9 @@ namespace OCIO_NAMESPACE
{
namespace
{
// Here is the explanation of the stat() method:
// https://pubs.opengroup.org/onlinepubs/009695299/basedefs/sys/stat.h.html
// "The st_ino and st_dev fields taken together uniquely identify the file within the system."
//
// However there are limitations to the stat() support on some Windows file systems:
// https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/stat-functions?redirectedfrom=MSDN&view=vs-2019
// "The inode, and therefore st_ino, has no meaning in the FAT, HPFS, or NTFS file systems."

// That's the default hash method implementation to compute a hash key based on a file content.
std::string DefaultComputeHash(const std::string &filename)
{
struct stat fileInfo;
if (stat(filename.c_str(), &fileInfo) == 0)
{
// Treat the st_dev (i.e. device) + st_ino (i.e. inode) as a proxy for the contents.

std::ostringstream fasthash;
fasthash << fileInfo.st_dev << ":";
#ifdef _WIN32
// TODO: The hard-linked files are then not correctly supported on Windows platforms.
fasthash << std::hash<std::string>{}(filename);
#else
fasthash << fileInfo.st_ino;
#endif
return fasthash.str();
}

return "";
}

// The global variable holds the hash function to use.
// It could be changed using SetComputeHashFunction() to customize the implementation.
ComputeHashFunction g_hashFunction = DefaultComputeHash;
ComputeHashFunction g_hashFunction = Platform::CreateFileContentHash;

// We mutex both the main map and each item individually, so that
// the potentially slow stat calls dont block other lookups to already
Expand Down Expand Up @@ -86,7 +56,7 @@ void SetComputeHashFunction(ComputeHashFunction hashFunction)

void ResetComputeHashFunction()
{
g_hashFunction = DefaultComputeHash;
g_hashFunction = Platform::CreateFileContentHash;
}

std::string GetFastFileHash(const std::string & filename)
Expand Down
102 changes: 96 additions & 6 deletions src/OpenColorIO/Platform.cpp
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
// SPDX-License-Identifier: BSD-3-Clause
// Copyright Contributors to the OpenColorIO Project.

#include <codecvt>
#include <locale>
#include <random>
#include <sstream>
#include <sys/stat.h>
#include <vector>

#include <OpenColorIO/OpenColorIO.h>
Expand Down Expand Up @@ -50,11 +53,21 @@ bool Getenv(const char * name, std::string & value)
}

#ifdef _WIN32
if(uint32_t size = GetEnvironmentVariable(name, nullptr, 0))
std::wstring name_u16 = Utf8ToUtf16(name);
if(uint32_t size = GetEnvironmentVariable(name_u16.c_str(), nullptr, 0))
{
std::vector<char> buffer(size);
GetEnvironmentVariable(name, buffer.data(), size);
value = std::string(buffer.data());
std::wstring value_u16(size, 0);
GetEnvironmentVariable(name_u16.c_str(), &value_u16[0], size);

// GetEnvironmentVariable is designed for raw pointer strings and therefore requires that
// the destination buffer be long enough to place a null terminator at the end of it. Since
// we're using std::wstrings here, the null terminator is unnecessary (and causes false
// negatives in unit tests since the extra character makes it "non-equal" to normally
// defined std::wstrings). Therefore, we pop the last character off (the null terminator)
// to ensure that the string conforms to expectations.
value_u16.pop_back();

value = Utf16ToUtf8(value_u16);
return true;
}
else
Expand All @@ -81,7 +94,7 @@ void Setenv(const char * name, const std::string & value)
// exists. To avoid the ambiguity, use Unsetenv() when the env. variable removal if needed.

#ifdef _WIN32
_putenv_s(name, value.c_str());
_wputenv_s(Utf8ToUtf16(name).c_str(), Utf8ToUtf16(value).c_str());
#else
::setenv(name, value.c_str(), 1);
#endif
Expand All @@ -96,7 +109,7 @@ void Unsetenv(const char * name)

#ifdef _WIN32
// Note that the Windows _putenv_s() removes the env. variable if the value is empty.
_putenv_s(name, "");
_wputenv_s(Utf8ToUtf16(name).c_str(), L"");
#else
::unsetenv(name);
#endif
Expand Down Expand Up @@ -197,6 +210,83 @@ std::string CreateTempFilename(const std::string & filenameExt)
return filename;
}

std::ifstream CreateInputFileStream(const char * filename, std::ios_base::openmode mode)
{
#ifdef _WIN32
return std::ifstream(Utf8ToUtf16(filename).c_str(), mode);
#else
return std::ifstream(filename, mode);
#endif
}

void OpenInputFileStream(std::ifstream & stream, const char * filename, std::ios_base::openmode mode)
{
#ifdef _WIN32
stream.open(Utf8ToUtf16(filename).c_str(), mode);
#else
stream.open(filename, mode);
#endif
}

std::wstring Utf8ToUtf16(std::string str)
{
#ifdef _WIN32
int sz = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
std::wstring wstr(sz, 0);
MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstr[0], sz);
return wstr;
#else
return std::wstring_convert<std::codecvt_utf8_utf16<wchar_t, 0x10ffff, std::codecvt_mode::little_endian>, wchar_t>{}.from_bytes(str);
#endif
}

std::string Utf16ToUtf8(std::wstring wstr)
{
#ifdef _WIN32
int sz = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL);
std::string str(sz, 0);
WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), &str[0], sz, NULL, NULL);
return str;
#else
return std::wstring_convert<std::codecvt_utf8_utf16<wchar_t, 0x10ffff, std::codecvt_mode::little_endian>, wchar_t>{}.to_bytes(wstr);
#endif
}

// Here is the explanation of the stat() method:
// https://pubs.opengroup.org/onlinepubs/009695299/basedefs/sys/stat.h.html
// "The st_ino and st_dev fields taken together uniquely identify the file within the system."
//
// However there are limitations to the stat() support on some Windows file systems:
// https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/stat-functions?redirectedfrom=MSDN&view=vs-2019
// "The inode, and therefore st_ino, has no meaning in the FAT, HPFS, or NTFS file systems."

// That's the default hash method implementation to compute a hash key based on a file content.
std::string CreateFileContentHash(const std::string &filename)
{
#ifdef _WIN32
struct _stat fileInfo;
if (_wstat(Platform::Utf8ToUtf16(filename).c_str(), &fileInfo) == 0)
#else
struct stat fileInfo;
if (stat(filename.c_str(), &fileInfo) == 0)
#endif
{
// Treat the st_dev (i.e. device) + st_ino (i.e. inode) as a proxy for the contents.

std::ostringstream fasthash;
fasthash << fileInfo.st_dev << ":";
#ifdef _WIN32
// TODO: The hard-linked files are then not correctly supported on Windows platforms.
fasthash << std::hash<std::string>{}(filename);
#else
fasthash << fileInfo.st_ino;
#endif
return fasthash.str();
}

return "";
}



} // Platform
Expand Down
30 changes: 30 additions & 0 deletions src/OpenColorIO/Platform.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,26 @@
#define WIN32_LEAN_AND_MEAN
#endif

// Many Win32 API functions are split into two versions: ANSI-only and Unicode, marked with an A or
// W suffix respectively. Calling a documented function will actually resolve to one of these
// variants depending on whether the calling code has declared that it's Unicode-compatible or not
// (by defining UNICODE). Example excerpt from Windows header:
//
// #ifdef UNICODE
// #define SetWindowText SetWindowTextW
// #else
// #define SetWindowText SetWindowTextA
// #endif
//
// By defining UNICODE, we ensure that any documented function will resolve to its Unicode version.
#define UNICODE

#include <windows.h>

#endif // _WIN32


#include <fstream>
#include <string>


Expand Down Expand Up @@ -73,6 +88,21 @@ void AlignedFree(void * memBlock);
// the file if created.
std::string CreateTempFilename(const std::string & filenameExt);

// Create an input file stream (std::ifstream) using a UTF-8 filename on any platform.
std::ifstream CreateInputFileStream(const char * filename, std::ios_base::openmode mode);

// Open an input file stream (std::ifstream) using a UTF-8 filename on any platform.
void OpenInputFileStream(std::ifstream & stream, const char * filename, std::ios_base::openmode mode);

// Create a unique hash of a file provided as a UTF-8 filename on any platform.
std::string CreateFileContentHash(const std::string &filename);

// Convert UTF-8 string to UTF-16LE.
std::wstring Utf8ToUtf16(std::string str);

// Convert UTF-16LE string to UTF-8.
std::string Utf16ToUtf8(std::wstring str);

}

} // namespace OCIO_NAMESPACE
Expand Down
2 changes: 1 addition & 1 deletion src/OpenColorIO/fileformats/FileFormatICC.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -541,7 +541,7 @@ FileFormat * CreateFileFormatICC()

std::string GetProfileDescriptionFromICCProfile(const char * ICCProfileFilepath)
{
std::ifstream filestream(ICCProfileFilepath, std::ios_base::binary);
std::ifstream filestream = Platform::CreateInputFileStream(ICCProfileFilepath, std::ios_base::binary);
if (!filestream.good())
{
std::ostringstream os;
Expand Down
6 changes: 4 additions & 2 deletions src/OpenColorIO/transforms/FileTransform.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -546,7 +546,8 @@ void LoadFileUncached(FileFormat * & returnFormat,
try
{
// Open the filePath
filestream.open(
Platform::OpenInputFileStream(
filestream,
filepath.c_str(),
tryFormat->isBinary()
? std::ios_base::binary : std::ios_base::in);
Expand Down Expand Up @@ -618,7 +619,8 @@ void LoadFileUncached(FileFormat * & returnFormat,
std::ifstream filestream;
try
{
filestream.open(filepath.c_str(), altFormat->isBinary()
Platform::OpenInputFileStream(
filestream, filepath.c_str(), altFormat->isBinary()
? std::ios_base::binary : std::ios_base::in);
if (!filestream.good())
{
Expand Down
Loading

0 comments on commit cfbc71f

Please sign in to comment.