-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix error loading LUT files with Turkish locale #984
Fix error loading LUT files with Turkish locale #984
Conversation
In file parsing there are tests like Lower(x) == "spilut". However this was using the locale dependent std::tolower, which for Turkish does not change I to i. Instead this implements a custom Lower() function which is not affected by locale and so can be used for file parsing. Signed-off-by: Brecht Van Lommel <brecht@blender.org>
|
Thanks @brechtvl for contributing to the project.
Because of its locale support the method Not being an expert I would suggest to first check for the locale and only customize the methods if the Turkish locale is enabled. But I still do not see how to correctly support the two use cases in OCIO. That's an interesting subject that OCIO does not address so any locales (letters, numerics, etc) are badly supported. It would be interesting to have your feedback on the topic. |
The existing implementation can not correctly lowercase For parsing file headers/tokens I think using byte strings is the correct solution. Even if you did support Unicode (e.g. for colorspace names) you don't want the interpretation of an OCIO config file or LUT to ever depend on the locale. User interface display of text might be affected by locale, but the color space transforms should not be. I did the initial implementation of Unicode string support in OSL, OIIO and Arnold. In my experience it's best to use all UTF-8 strings in the API and internally since that's what Linux and macOS will already give you. Then for Windows you need to add code for every Windows API function that deals with filepaths, program arguments to convert between the native encoding and UTF-8. I think it would be good to improve OpenColorIO to do the same. But in my opinion that's actually solving a different problem and this change by itself is safe. |
Thanks for the information. Let me read a little bit... |
OCIO supports UTF-8 encoding. Some existing unit tests validate the point:
|
Yes, I think OCIO supports UTF-8 encoding to the extent that is automatically provided by UTF-8 backwards compatibility with ASCII. Which is enough when using ASCII keywords and delimiters and passing along or comparing any other strings unmodified. But for operations like lowercasing |
Hey Brecht, thanks for your work on this. There are a number of locale related issues logged here on GitHub, so it's great to have someone with your expertise participating. The main concern I have with your proposal is that Lower is used in a lot of places and your change may have unanticipated consequences for OCIO behavior. For example it is used by parseColorSpaceFromString, so it will affect how ColorSpace names are located within file paths. It will also impact the new FileRules. It is also used in various Role and ColorSpace name comparisons. For situations where we are comparing to known tokens in OCIO (such as supported file extensions), these are all ASCII so your change will work. For cases where we are working with ColorSpace names, I think the behavior needs to be validated a bit wider within the community that may not be actively tracking these issues. I would summarize your proposal as follows: when comparing strings such as ColorSpace names and paths, comparisons on ASCII characters will be non-case-sensitive but other characters will be case-sensitive. Is the community OK with that? My own take is that I agree with you that we do not want locale differences to change the way a config works, so I'm happy with your fix. However, I'm also not in the practice of using non-ASCII characters, hence I think the proposal needs to be validated. I will post on Slack to try to get some more people to take a look. If others could weigh in here, I'm interested more broadly in opinions on whether we even want to offer the courtesy of non-case-sensitive behavior. It seems like the potential bugs it introduces may outweigh the convenience of not requiring people to be precise with case. (And I'm referring here specifically to things like ColorSpace names rather than things like file extensions which are already ASCII and which will remain non-case-sensitive.) There are a number of v2 features that currently use Lower() that we may want to change based on the community opinion. |
It's not so much that I'm proposing this, it's what OpenColorIO is already doing. What I'm proposing is a conservative fix for the Turkish locale case, that as far as I can tell changes nothing for C or English locales. Non-ASCII characters are already and would continue to be unchanged by Removing case-insensitive behavior from file parsing in OpenColorIO seems like a good solution too. I guess that would be done selectively in some places and |
Yes, agreed your fix is no change if the locale is already English but the use-case I had in mind was people in non-English countries that would have a non-English locale and might be using non-ASCII characters in their configs. It might be a change in behavior for them. Again, I'm not opposed to your fix, I just wanted to encourage others to take a look at it since it is potentially a change in behavior beyond the immediate issue of file extensions. |
@brechtvl I tried hard to reproduce the problem on my macos machine by changing the locale (to Note: To test I've added On the other hand, looking at the file extension detection code in the library I wanted to narrow the code down to the file extension check only (to avoid changing a code impacting too many features) and I found that only one method could be updated (i.e. Below is the code change:
With a way to reproduce the problem, I should be able to test the change or you can test it in your environment. |
We discussed this PR during our TSC meeting today. (Also, I had previously posted on Slack to try and get others to take a look at the PR.) There was support in the TSC for the notion that we want to reduce the possibilities for locale differences to cause unexpected changes of behavior in OCIO, even if that means effectively imposing an English locale. Based on that, I'm approving the PR. @brechtvl , if you get any more spare time, there is also a localization issue around number parsing (e.g. #297, and #379) that could use some help! ;) |
In file parsing there are tests like
Lower(x) == "spilut"
. However this wasusing the locale dependent
std::tolower
, which for Turkish does not changeI to i.
Instead this implements a custom
Lower()
function which is not affected bylocale and so can be used for file parsing.
For more background on this problem see for example:
https://docs.microsoft.com/en-us/previous-versions/dotnet/articles/ms973919(v=msdn.10)?redirectedfrom=MSDN#the-motivation-the-turkish-i-problem
Signed-off-by: Brecht Van Lommel brecht@blender.org