Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Numeric read/write locale independent #1322

Closed
hodoulp opened this issue Feb 22, 2021 · 16 comments · Fixed by #1496
Closed

Make Numeric read/write locale independent #1322

hodoulp opened this issue Feb 22, 2021 · 16 comments · Fixed by #1496

Comments

@hodoulp
Copy link
Member

hodoulp commented Feb 22, 2021

For the last few years, the locale introduced some challenges to any OpenColorIO integration as the library does not recommend or enforce any locale requirement(s). So, the float number read could fail if the floats are saved like 10,0 instead of 10.0 (i.e. French versus English notations). For example using a French localized DCC to save a LUT file and/or a config file, and later use an English localized DCC to read one of them is the use case to correctly handle.

Following the discussion at a TSC meeting, the proposal is then to at least enforce an English float notation independently from the host locale, to read/write all numerics.

Over the years, several pull requests were created around that problem: #1089, #297, etc.

@enkore
Copy link

enkore commented Feb 22, 2021

A slightly "heretic" idea, but as I understand things SPI1D and SPI3D are Sony inventions which more or less are only used by OCIO, correct? Developing or using a binary format would solve both performance (loading a bunch of LUTs is still kinda slow, even with the sscanf code; initializing OCIO takes a second or so in my app, even though it's effectively only handling a few MB) and interoperability issues, as well as approximately halving file size. This of course doesn't solve the problem for existing deployments and files.

I briefly looked into using one of the "faster-than-libc" float parsing libraries that are out there (and generally completely ignore locale for performance) for #1089 but in part due to the CLA stuff concluded using such a thing would be infeasible in OCIO. That's pretty much were I got off the train.

@Dithermaster
Copy link

Dithermaster commented Feb 22, 2021

We had also seen some slow startup times (yes, we should load on a background thread, etc.) and I solved it two ways:

  1. First way was to replace the decimal string parsing in the .spi3d reader with something faster, but then later and now instead
  2. I extended the .spi1d and .spi3d to read a hexadecimal format that represents in-memory float format, so no conversion is necessary. This helped a lot. I hadn't designed the format to be something we present back, but would be happy to do so (it's really simple, here is a sample:)
From 0.0 1.0
Length 4096
Components 1
{
a7218fbe6d138fbe33058fbef9f68ebebfe88ebe85da8ebe4bcc8ebe11be8ebed7af8ebe9da18ebe63938ebe29858ebeef768ebeb4688ebe7a5a8ebe404c8ebe
063e8ebecc2f8ebe92218ebe58138ebe1e058ebee4f68dbeaae88dbe70da8dbe36cc8dbefcbd8dbec1af8dbe87a18dbe4d938dbe13858dbed9768dbe9f688dbe
655a8dbe2b4c8dbef13d8dbeb72f8dbe7d218dbe43138dbe09058dbecef68cbe94e88cbe5ada8cbe20cc8cbee6bd8cbeacaf8cbe72a18cbe38938cbefe848cbe
...
fd0c164384061743aa01184371fe1843ddfc1943f0fc1a43adfe1b4316021d432f071e43fa0d1f437a162043b1202143a32c2243533a2343c4492443f75a2543
f16d2643b582274344992843a3b12943d3cb2a43d9e72b43b8052d4372252e430a472f43836a3043e28f314328b7324359e03343790b3543893836438f673743
8c98384384cb39437b003b4374373c4372703d4378ab3e438ae83f43ab274143df68424329ac43438df144430d394643af82474374ce4843611c4a43796c4b43
}

@Dithermaster
Copy link

FWIW, I had investigated using hex float representation (e.g., "0x1.921FB4D12D84Ap+1"), but oddly they parsed slower than decimal.

@enkore
Copy link

enkore commented Feb 22, 2021

Hexfloats aren't used a lot, so I wouldn't be surprised if they're not well optimized in standard libraries. I mean... you can't even read them using C++ streams.

@KevinJW
Copy link
Contributor

KevinJW commented Feb 23, 2021

The most obvious problem with outputting binary/binary equivalents is that you are adding a requirement for equivalence of representation across platforms - now this may not be a problem as most platforms probably use IEEE 754 but there is at least an endian related problem. At which point I'd certainly look for a library solution for handling this serialisation e.g protobuf, thrift, or maybe just look for some other lightweight bson library rather than rolling our own routines.

Kevin

@hodoulp
Copy link
Member Author

hodoulp commented Feb 23, 2021

Thanks for all the great ideas but the focus for short-term is to support existing LUT file formats independently from the locale and without any performance hit.

On the other hand developing a fast binary LUT file format is a great idea. But I think that it should not be an OCIO specific format. I would suggest to bring that discussion to the ACES CLF working group. If accepted, that approach would then provide standardisation (i.e. under the ACES CLF umbrella with OCIO as a reference implementation) and performance.

@hodoulp hodoulp self-assigned this Feb 25, 2021
@hodoulp hodoulp removed their assignment Mar 23, 2021
@hodoulp
Copy link
Member Author

hodoulp commented Mar 26, 2021

The pull request #1351 contains a discussion about the locale agnostic feature plus some hints from OpenImageIO on how to make it happen.

@amyspark
Copy link
Contributor

amyspark commented Sep 6, 2021

Hi @hodoulp,

I just closed bug 407921 in Krita which is a direct repro of #297. The only way I could get rid of it (short of employing a sstream-based parser) is by NatronGitHub/openfx-io@27a13e0.

Is there any news regarding this?

@hodoulp
Copy link
Member Author

hodoulp commented Sep 7, 2021

@amyspark no significant progress on that front. It would be great if you can add make a pull request with the Natron improvement so we can resume the discussion.

@amyspark
Copy link
Contributor

amyspark commented Sep 9, 2021

if you can add make a pull request with the Natron improvement

I can't do it against OCIO, as it's BSD3 and Natron is GPL2.

@lgritz
Copy link
Collaborator

lgritz commented Sep 9, 2021

Ugh, this is why I can't stand GPL. But I digress.

There are two ways to skin this cat:

First, unless the technique is patented, you (or somebody else) ought to be able to rewrite the code from scratch, an independent implementation of the idea of saving/restoring the locale in any OCIO function that does I/O. It's just not that complicated.

Second, whoever wrote this code originally still owns it. They may have submitted it to Natron to be redistributed under GPL2, but the original author still owns the copyright and so I believe they are free to submit it to a different project under a different license. (Unless Natron requires copyright assignment of all submissions, not merely licensing. VERY few projects require copyright assignment. But if that is the case, then forget what I said about relicensing, you would need to re-implement.)

Though personally, although I do appreciate this clever approach and like how unobtrusive it is to the surrounding code, I think it's a little silly to constantly get, save, change, and restore the locale EVERY time you enter and exit certain functions. It seems like a better approach is to just use locale-independent I/O, and then it just doesn't matter what locale the surrounding app is using. Furthermore, locale-independent I/O will tend to be much faster.

My favourite choice these days is to base all text I/O on the fmt library. It's locale-independent, uses the new C++20 std::format syntax (but also has printf-like versions), has hands-down the fastest int and float <-> string implementations, and a bunch of other goodies.

@Dithermaster
Copy link

I wrote a C++ RAII wrapper that saves and restores locale on Windows, Mac, and Linux. I'd be happy to contribute it. We use it to wrap our calls into OCIO v1, so it's OCIO-proven already.

@amyspark
Copy link
Contributor

amyspark commented Sep 9, 2021

My favourite choice these days is to base all text I/O on the fmt library. It's locale-independent, uses the new C++20 std::format syntax (but also has printf-like versions), has hands-down the fastest int and float <-> string implementations, and a bunch of other goodies.

I'd refrained from suggesting charconv's from_chars because OCIO requires C++11, but if fmt can be used, 🎉 !

@enkore
Copy link

enkore commented Sep 9, 2021

fmtlib seems to only require C++11: "The library is highly portable and relies only on a small set of C++11 features:" Which would be great news! By far the best solution to this problem. Maybe that can even boost performance noticeably over scanf (unlike C++ iostream cough cough)

@amyspark
Copy link
Contributor

Upon further review, fmt has *-to-string formatting, but not the way around @lgritz?

If from_chars is an option, I found Abseil's charconv support, which only requires C++11; in the same spirit, fastfloat/fast_float and lemire/fast_double_parser both look like sane alternatives. They'd only require a developer to make sscanf extract the numbers as strings, and then parse these separately.

@lgritz
Copy link
Collaborator

lgritz commented Sep 12, 2021

Um, you may be right, it may be that fmt has great float->string, but maybe not the other way around. That would make sense now, since fmt is about output, not input.

OIIO (in its strutil.cpp) has its own locale-independent stoi/stof, but I don't know off the top of my head how their performance is, because in the context of OIIO it's just not important.

Wait, that's not true, I have a unit test that does benchmarks. This is all on Mac with clang, YMMV on Linux or gcc.

The "std" is C++ std. The "Strutil" versions are all using fmt or our own rolled underneath.

  std atoi        :   17.6 ns (+/- 0.2ns),   56.9 M/s
  Strutil::stoi(string) :   11.7 ns (+/- 0.2ns),   85.7 M/s

  std atof        :   21.3 ns (+/- 0.5ns),   46.9 M/s
  Strutil::stof(string) - locale-independent:   22.8 ns (+/- 0.2ns),   43.9 M/s

  get default locale:   12.9 ns (+/- 0.0ns),   77.7 M/s
  ref classic locale:    1.2 ns (+/- 0.1ns),  862.3 M/s
  locale switch (to classic):  263.3 ns (+/-10.0ns),    3.8 M/s

So the stoi I'm using is significantly faster than atoi, but the stof is the same speed or maybe just a tad slower than std strtod or atof (but for my needs here, it didn't need to be faster, it just needed to be reliably locale-independent). I'm sure there are faster implementations you can find if you look.

Note also that I time how long it takes to get a reference to the C locale (only 1.2ns) and the cost of switching locales (263ns -- an order of magnitude more expensive than a string->float conversion).

And also, below, I have output timing comparisons of std::sprintf vs the Strutil::format (based on fmt, and thus much faster than std C++):

  std::sprintf("%g"):  127.2 ns (+/- 4.6ns),    7.9 M/s
  Strutil::fmt::format("{:g}"):   75.5 ns (+/- 1.9ns),   13.2 M/s
  Strutil::to_string(float):   33.5 ns (+/- 0.8ns),   29.9 M/s

  std::sprintf("%d"):   48.5 ns (+/- 1.3ns),   20.6 M/s
  Strutil::fmt::format("{}"):    8.4 ns (+/- 0.4ns),  118.8 M/s

  std::sprintf("%g %d %s %d %s %g"):  366.3 ns (+/- 4.8ns),    2.7 M/s
  Strutil::fmt::format("{} {} {} {} {} {}"):  186.5 ns (+/- 4.7ns),    5.4 M/s

If you have a current local build of OIIO, you can just run build/bin/strutil_test for all this and more on your favourite platform.

amyspark added a commit to amyspark/OpenColorIO that referenced this issue Sep 20, 2021
This commit brings in the amalgamated header of the fast_float library,
and applies it to our uses of sscanf with %f.

This library is locale-free, and does not incur the performance penalty of
using a thread locale RAII class.

Fixes AcademySoftwareFoundation#297
Fixes AcademySoftwareFoundation#379
Fixes AcademySoftwareFoundation#1322

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
amyspark added a commit to amyspark/OpenColorIO that referenced this issue Oct 9, 2021
This commit adds Daniel Lemire's fast_float library as an external dependency,
and applies it to our uses of sscanf with %f as well as istringstream.

This library is locale-free, and does not incur the performance penalty of
using a thread locale RAII class or imbuing a locale into a
stringstream.

See:
    Daniel Lemire, Number Parsing at a Gigabyte per Second, Software:
    Pratice and Experience 51 (8), 2021.
    <https://arxiv.org/abs/2101.11408>

Fixes AcademySoftwareFoundation#297
Fixes AcademySoftwareFoundation#379
Fixes AcademySoftwareFoundation#1322

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
amyspark added a commit to amyspark/OpenColorIO that referenced this issue Oct 9, 2021
This commit adds Daniel Lemire's fast_float library as an external dependency,
and applies it to our uses of sscanf with %f as well as istringstream.

This library is locale-free, and does not incur the performance penalty of
using a thread locale RAII class or imbuing a locale into a
stringstream.

See:
    Daniel Lemire, Number Parsing at a Gigabyte per Second, Software:
    Pratice and Experience 51 (8), 2021.
    <https://arxiv.org/abs/2101.11408>

Fixes AcademySoftwareFoundation#297
Fixes AcademySoftwareFoundation#379
Fixes AcademySoftwareFoundation#1322

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
amyspark added a commit to amyspark/OpenColorIO that referenced this issue Nov 10, 2021
This commit adds Daniel Lemire's fast_float library as an external
dependency, and applies it to our uses of sscanf with %f, strtod,
and istringstream.

Additionally, for parsing integer numbers, we implement a from_chars
shim that forwards the call to strtol_l along with a statically
initialized locale constant.

The usage of fast_float is warded by a new OCIO_USE_FAST_FLOAT
configuration variable; if disabled, an identical approach to integers
is followed.

See:
    Daniel Lemire, Number Parsing at a Gigabyte per Second, Software:
    Pratice and Experience 51 (8), 2021.
    <https://arxiv.org/abs/2101.11408>

Fixes AcademySoftwareFoundation#297
Fixes AcademySoftwareFoundation#379
Fixes AcademySoftwareFoundation#1322

Co-Authored-By: Patrick Hodoul <Patrick.Hodoul@autodesk.com>

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
amyspark added a commit to amyspark/OpenColorIO that referenced this issue Nov 10, 2021
This commit adds Daniel Lemire's fast_float library as an external
dependency, and applies it to our uses of sscanf with %f, strtod,
and istringstream.

Additionally, for parsing integer numbers, we implement a from_chars
shim that forwards the call to strtol_l along with a statically
initialized locale constant.

The usage of fast_float is warded by a new OCIO_USE_FAST_FLOAT
configuration variable; if disabled, an identical approach to integers
is followed.

See:
    Daniel Lemire, Number Parsing at a Gigabyte per Second, Software:
    Pratice and Experience 51 (8), 2021.
    <https://arxiv.org/abs/2101.11408>

Fixes AcademySoftwareFoundation#297
Fixes AcademySoftwareFoundation#379
Fixes AcademySoftwareFoundation#1322

Co-Authored-By: Patrick Hodoul <Patrick.Hodoul@autodesk.com>

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
amyspark added a commit to amyspark/OpenColorIO that referenced this issue Nov 10, 2021
This commit adds Daniel Lemire's fast_float library as an external
dependency, and applies it to our uses of sscanf with %f, strtod,
and istringstream.

Additionally, for parsing integer numbers, we implement a from_chars
shim that forwards the call to strtol_l along with a statically
initialized locale constant.

The usage of fast_float is warded by a new OCIO_USE_FAST_FLOAT
configuration variable; if disabled, an identical approach to integers
is followed.

See:
    Daniel Lemire, Number Parsing at a Gigabyte per Second, Software:
    Pratice and Experience 51 (8), 2021.
    <https://arxiv.org/abs/2101.11408>

Fixes AcademySoftwareFoundation#297
Fixes AcademySoftwareFoundation#379
Fixes AcademySoftwareFoundation#1322

Co-Authored-By: Patrick Hodoul <Patrick.Hodoul@autodesk.com>

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
amyspark added a commit to amyspark/OpenColorIO that referenced this issue Nov 10, 2021
This commit adds Daniel Lemire's fast_float library as an external
dependency, and applies it to our uses of sscanf with %f, strtod,
and istringstream.

Additionally, for parsing integer numbers, we implement a from_chars
shim that forwards the call to strtol_l along with a statically
initialized locale constant.

The usage of fast_float is warded by a new OCIO_USE_FAST_FLOAT
configuration variable; if disabled, an identical approach to integers
is followed.

See:
    Daniel Lemire, Number Parsing at a Gigabyte per Second, Software:
    Pratice and Experience 51 (8), 2021.
    <https://arxiv.org/abs/2101.11408>

Fixes AcademySoftwareFoundation#297
Fixes AcademySoftwareFoundation#379
Fixes AcademySoftwareFoundation#1322

Co-Authored-By: Patrick Hodoul <Patrick.Hodoul@autodesk.com>

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
amyspark added a commit to amyspark/OpenColorIO that referenced this issue Nov 10, 2021
This commit adds Daniel Lemire's fast_float library as an external dependency,
and applies it to our uses of sscanf with %f as well as istringstream.

This library is locale-free, and does not incur the performance penalty of
using a thread locale RAII class or imbuing a locale into a
stringstream.

See:
    Daniel Lemire, Number Parsing at a Gigabyte per Second, Software:
    Pratice and Experience 51 (8), 2021.
    <https://arxiv.org/abs/2101.11408>

Fixes AcademySoftwareFoundation#297
Fixes AcademySoftwareFoundation#379
Fixes AcademySoftwareFoundation#1322

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
amyspark added a commit to amyspark/OpenColorIO that referenced this issue Nov 11, 2021
This commit adds Daniel Lemire's fast_float library as an external
dependency, and applies it to our uses of sscanf with %f, strtod,
and istringstream.

Additionally, for parsing integer numbers, we implement a from_chars
shim that forwards the call to strtol_l along with a statically
initialized locale constant. Unfortunately, to preserve consistency with
from_chars the character range must be copied to a temporary
std::string. The test suite has been adapted to account for this new
behaviour.

The usage of fast_float is warded by a new OCIO_USE_FAST_FLOAT
configuration variable; if disabled, an identical approach to integers
is followed.

See:
    Daniel Lemire, Number Parsing at a Gigabyte per Second, Software:
    Pratice and Experience 51 (8), 2021.
    <https://arxiv.org/abs/2101.11408>

Fixes AcademySoftwareFoundation#297
Fixes AcademySoftwareFoundation#379
Fixes AcademySoftwareFoundation#1322

Co-Authored-By: Patrick Hodoul <Patrick.Hodoul@autodesk.com>

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
amyspark added a commit to amyspark/OpenColorIO that referenced this issue Nov 24, 2021
This commit adds Daniel Lemire's fast_float library as an external
dependency, and applies it to our uses of sscanf with %f, strtod,
and istringstream.

Additionally, for parsing integer numbers, we implement a from_chars
shim that forwards the call to strtol_l along with a statically
initialized locale constant. Unfortunately, to preserve consistency with
from_chars the character range must be copied to a temporary
std::string. The test suite has been adapted to account for this new
behaviour.

The usage of fast_float is warded by a new OCIO_USE_FAST_FLOAT
configuration variable; if disabled, an identical approach to integers
is followed.

See:
    Daniel Lemire, Number Parsing at a Gigabyte per Second, Software:
    Pratice and Experience 51 (8), 2021.
    <https://arxiv.org/abs/2101.11408>

Fixes AcademySoftwareFoundation#297
Fixes AcademySoftwareFoundation#379
Fixes AcademySoftwareFoundation#1322

Co-Authored-By: Patrick Hodoul <Patrick.Hodoul@autodesk.com>

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
amyspark added a commit to amyspark/OpenColorIO that referenced this issue Dec 7, 2021
This commit adds support for parsing numbers without being influenced by
the current system locale. We implement a from_chars
shim that forwards the call to strto*_l along with a statically
initialized locale constant.

Fixes AcademySoftwareFoundation#297
Fixes AcademySoftwareFoundation#379
Fixes AcademySoftwareFoundation#1322

Co-Authored-By: Patrick Hodoul <Patrick.Hodoul@autodesk.com>

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
amyspark added a commit to amyspark/OpenColorIO that referenced this issue Dec 8, 2021
This commit adds support for parsing numbers without being influenced by
the current system locale. We implement a from_chars
shim that forwards the call to strto*_l along with a statically
initialized locale constant.

Fixes AcademySoftwareFoundation#297
Fixes AcademySoftwareFoundation#379
Fixes AcademySoftwareFoundation#1322

Co-Authored-By: Patrick Hodoul <Patrick.Hodoul@autodesk.com>

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
hodoulp pushed a commit that referenced this issue Dec 8, 2021
This commit adds support for parsing numbers without being influenced by
the current system locale. We implement a from_chars
shim that forwards the call to strto*_l along with a statically
initialized locale constant.

Fixes #297
Fixes #379
Fixes #1322

Co-Authored-By: Patrick Hodoul <Patrick.Hodoul@autodesk.com>

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
hodoulp pushed a commit that referenced this issue Dec 8, 2021
This commit adds support for parsing numbers without being influenced by
the current system locale. We implement a from_chars
shim that forwards the call to strto*_l along with a statically
initialized locale constant.

Fixes #297
Fixes #379
Fixes #1322

Co-Authored-By: Patrick Hodoul <Patrick.Hodoul@autodesk.com>

Signed-off-by: L. E. Segovia <13498015+amyspark@users.noreply.github.com>
hodoulp pushed a commit that referenced this issue Dec 9, 2021
This commit adds support for parsing numbers without being influenced by
the current system locale. We implement a from_chars
shim that forwards the call to strto*_l along with a statically
initialized locale constant.

Fixes #297
Fixes #379
Fixes #1322

Co-Authored-By: Patrick Hodoul <Patrick.Hodoul@autodesk.com>

Signed-off-by: amyspark <13498015+amyspark@users.noreply.github.com>
hodoulp added a commit that referenced this issue Dec 9, 2021
This commit adds support for parsing numbers without being influenced by
the current system locale. We implement a from_chars
shim that forwards the call to strto*_l along with a statically
initialized locale constant.

Fixes #297
Fixes #379
Fixes #1322

Co-Authored-By: Patrick Hodoul <Patrick.Hodoul@autodesk.com>

Signed-off-by: amyspark <13498015+amyspark@users.noreply.github.com>

Co-authored-by: amyspark <13498015+amyspark@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants