Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::format with std::chrono poor performance #3575

Open
nuuSolutions opened this issue Mar 15, 2023 · 7 comments
Open

std::format with std::chrono poor performance #3575

nuuSolutions opened this issue Mar 15, 2023 · 7 comments
Labels
chrono C++20 chrono format C++20/23 format performance Must go faster

Comments

@nuuSolutions
Copy link

nuuSolutions commented Mar 15, 2023

Is this a known issue or am I doing something sub-optimal?

I'm simply calling

std::format_to( std::back_inserter( buffer ), "[{:%H:%M:%S}] ", time );

where buffer is a std::string and time is of type std::chrono::system_clock::duration
and this is about 4-5 times slower than 'rolling my own' e.g.

static constexpr auto h = 3600'000'000'000LL;
static constexpr auto min = 60'000'000'000LL;
static constexpr double sec = 1'000'000'000;
auto elapsed_ns = time.count();
const auto hours = elapsed_ns / h; elapsed_ns -= hours * h;
const auto minutes = elapsed_ns / min; elapsed_ns -= minutes * min;
const double seconds = elapsed_ns / sec;

std::format_to( std::back_inserter( buffer ), "[{}:{:02}:{:010.7f}] ", hours, minutes, seconds );

Not sure where this huge difference can come from. Allocations?

@nuuSolutions nuuSolutions added the question Further information is requested label Mar 15, 2023
@StephanTLavavej StephanTLavavej added performance Must go faster format C++20/23 format chrono C++20 chrono and removed question Further information is requested labels Mar 15, 2023
@StephanTLavavej
Copy link
Member

Thanks for the report - this is not a known issue so we'll need to investigate.

@timsong-cpp
Copy link
Contributor

The chrono formatter parses the entire spec into a vector and then routes all output through an ostringstream and in many cases put_time. That seems...unlikely to be performant.

@TheStormN
Copy link
Contributor

@StephanTLavavej As I promised from my previous PRs I did started work to see how time formatting can be optimized in Microsoft STL.
I will again explain the main bottleneck of the routine to have more context here.

Basically this is slow, because the UCRT strftime() function under the hood can currently only work with wchar_t strings. Of course this results in several heavy operations. The steps which strftime() is doing are as follows:

  • Allocate same size(not in bytes, but in characters) of wchar_t buffer as the input format string.
  • Convert the input format string to wchar_t using the allocated buffer from the previous step.
  • Allocates wchar_t buffer for the result, based on the passed maxsize of the original char* output buffer.
  • Processes all specifiers and writes the output to the allocated wchar_t buffer from the step above.
  • Converts back the wchar_t buffer to char* storing it in the original output buffer.
  • Free both wchar_t buffers which were used for handing the format and result strings.

Of course if the output buffer size is not large enough, the function will stop and return zero, but all the allocation and conversion steps will still remain.

Now, if we had all these steps run once for the whole format string, perhaps it wouldn't be that bad in terms of performance. However the C++ specification of std::time_put<CharT,OutputIt>::put and std::time_put<CharT,OutputIt>::do_put requires each valid format specifier to be processed separately, thus a simple format string as "%H:%M:%S" will cause strftime() to be called 3 times in a row, doing all the heap allocations and string conversions. For more complex format strings the performance hit is N times worse and really noticeable.

My original idea was to sanitize the format buffer to escape characters that would crash UCRT and pass the whole thing directly to strftime(), effectively doing the heavy operations just once. However, this is not possible as it will basically violate the C++ standard requirements which I've already explained.

At this point I'm open to suggestions.

P.S. My best bet would still be to convince the UCRT team to fixup the implementation of strftime() to:

  • Not do any conversion stuff.
  • On invalid specifiers do the same thing as other major implementations so we can avoid a lot of additional processing and complexity on the STL side.

@frederick-vs-ja
Copy link
Contributor

BTW, it's possibly optimizing to simplify operator<< overloads by not using format. #2003 is related.

@TheStormN
Copy link
Contributor

TheStormN commented Sep 26, 2024

Even if we optimize operator<< it still ends up calling strftime() the way I've explained. Also this issue is about runtime performance, not compilation times.

@frederick-vs-ja
Copy link
Contributor

Even if we optimize operator<< it still ends up calling strftime() the way I've explained. Also this issue is about runtime performance, not compilation times.

Did you really mean that we need to call strftime() (or its friends) even if the format string is known to be "%H:%M:%S"?

I think we can avoid strftime() call for it calls because only the decimal point need to be localized during output, and we only need to call decimal_point() once. Also, it can save runtime parsing of statically determined format string by avoid using format.

For operator<<, IIUC only we only need to call strftime() or its friends when outputting month and weekday (due to "{:L%b}"/"{:L%a}").

@TheStormN
Copy link
Contributor

@frederick-vs-ja You are right, but what really happens is that strftime() is everywhere. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chrono C++20 chrono format C++20/23 format performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants