Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Printf get container #1982

Merged
merged 5 commits into from
Nov 12, 2020
Merged

Printf get container #1982

merged 5 commits into from
Nov 12, 2020

Conversation

rimathia
Copy link
Contributor

@rimathia rimathia commented Nov 5, 2020

I agree that my contributions are licensed under the {fmt} license, and agree to future changes to the licensing.

This affects the performance of fmt::sprintf in the case where the format string is at least on the order of 100 characters long.

The benchmark:

#include <benchmark/benchmark.h>
#include <fmt/printf.h>

void long_printf_format_string(benchmark::State& state) {
  std::string format_string = std::string(state.range(0), 'A') + "%s";
  std::string message = "Hi";
  for (auto _ : state) {
    benchmark::DoNotOptimize(fmt::sprintf(format_string, message));
  };
}
BENCHMARK(long_printf_format_string)->Arg(10)->Arg(100)->Arg(1000)->Arg(10000);

void long_fmt_format_string(benchmark::State& state) {
  std::string format_string = std::string(state.range(0), 'A') + "{}";
  std::string message = "Hi";
  for (auto _ : state) {
    benchmark::DoNotOptimize(fmt::format(format_string, message));
  };
}
BENCHMARK(long_fmt_format_string)->Arg(10)->Arg(100)->Arg(1000)->Arg(10000);

The benchmark numbers change as follows:

master (60dc273)

--------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations
--------------------------------------------------------------------------
long_printf_format_string/10          39.9 ns         39.9 ns     17364987
long_printf_format_string/100          188 ns          188 ns      3697522
long_printf_format_string/1000        1402 ns         1402 ns       494033
long_printf_format_string/10000      12409 ns        12409 ns        55024
long_fmt_format_string/10             28.0 ns         28.0 ns     24726944
long_fmt_format_string/100            88.2 ns         88.2 ns      7890169
long_fmt_format_string/1000            333 ns          333 ns      2119728
long_fmt_format_string/10000           873 ns          873 ns       779553

branch (104b3de)

--------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations
--------------------------------------------------------------------------
long_printf_format_string/10          36.7 ns         36.7 ns     17982978
long_printf_format_string/100         95.8 ns         95.8 ns      7214487
long_printf_format_string/1000         337 ns          337 ns      2075802
long_printf_format_string/10000        763 ns          763 ns       910155
long_fmt_format_string/10             28.0 ns         28.0 ns     24945565
long_fmt_format_string/100            86.6 ns         86.6 ns      7932910
long_fmt_format_string/1000            330 ns          330 ns      2093427
long_fmt_format_string/10000           838 ns          838 ns       807894

@vitaut
Copy link
Contributor

vitaut commented Nov 5, 2020

Thanks for the PR. Please fix formatting in the PR description and post benchmark results as text, not as image.

@rimathia
Copy link
Contributor Author

rimathia commented Nov 5, 2020

Thanks for getting to the PR so quickly. Just tell me if you'd like to have the json output as well.

Copy link
Contributor

@vitaut vitaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

The idea looks fine but I think you should use fmt::detail::write instead of reimplementing container/buffer detection.

@rimathia rimathia force-pushed the printf_get_container branch 2 times, most recently from d566465 to c4f62bf Compare November 8, 2020 18:40
@rimathia
Copy link
Contributor Author

rimathia commented Nov 8, 2020

I'll have to get back to this next week. Just to be sure I'm not operating on wrong assumptions: The following two benchmarks should have very similar results, right?

void long_printf_format_string(benchmark::State& state) {
  std::string format_string = std::string(state.range(0), 'A') + "%s";
  std::string message = "Hi";
  for (auto _ : state) {
    benchmark::DoNotOptimize(fmt::sprintf(format_string, message));
  };
}
BENCHMARK(long_printf_format_string)->Arg(10)->Arg(100)->Arg(1000)->Arg(10000);

void long_fmt_format_string(benchmark::State& state) {
  std::string format_string = std::string(state.range(0), 'A') + "{}";
  std::string message = "Hi";
  for (auto _ : state) {
    benchmark::DoNotOptimize(fmt::format(format_string, message));
  };
}
BENCHMARK(long_fmt_format_string)->Arg(10)->Arg(100)->Arg(1000)->Arg(10000);

On the branch (c4f62bf) the version with printf becomes quite a bit faster than the fmt version from 100 characters:

--------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations
--------------------------------------------------------------------------
long_printf_format_string/10          45.7 ns         45.6 ns     14294174
long_printf_format_string/100          116 ns          116 ns      6097242
long_printf_format_string/1000         405 ns          405 ns      1756349
long_printf_format_string/10000        870 ns          869 ns       775744
long_fmt_format_string/10             42.0 ns         42.0 ns     16111176
long_fmt_format_string/100             237 ns          237 ns      2903444
long_fmt_format_string/1000           1696 ns         1695 ns       401220
long_fmt_format_string/10000         13772 ns        13755 ns        50931

@vitaut
Copy link
Contributor

vitaut commented Nov 8, 2020

The following two benchmarks should have very similar results, right?

Somewhat similar. The printf numbers look suspicious both compared to format and your earlier results. I recommend checking that the modified version produces the correct output.

@vitaut
Copy link
Contributor

vitaut commented Nov 11, 2020

I finally got a chance to look into this and the results are indeed correct. As it turned out format_handler::on_text wasn't going through the fast path which I fixed in beb248b:

--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
long_fmt_format_string           899 ns          899 ns       749016

Thanks for catching this!

My earlier comment about using fmt::detail::write still holds.

… a fmt::basic_memory_buffer character by character instead of using fmt::basic_memory_buffer::append
@rimathia rimathia requested a review from vitaut November 12, 2020 11:01
@rimathia
Copy link
Contributor Author

I have edited the top message to reflect the latest state, I hope this isn't too confusing.

Comment on lines 622 to 627
EXPECT_EQ((truncated_printf_context(
EXPECT_EQ(expected_size,
(truncated_printf_context(
fmt::detail::truncating_iterator<backit>(it, 0), format_string,
fmt::make_format_args<truncated_printf_context>(format_arg))
.format()
.count()),
expected_size);
.count()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unnecessary, please revert.

Comment on lines 630 to 644
TEST(PrintfTest, PrintfAppendToBuffer) {
using backit = std::back_insert_iterator<fmt::basic_memory_buffer<char>>;
using context = fmt::basic_printf_context<backit, char>;

const auto format_string = "%s";
const char* format_arg = "Hello";
fmt::basic_memory_buffer<char> buffer;
context(std::back_inserter(buffer), format_string,
fmt::make_format_args<context>(format_arg))
.format();

std::string result(std::begin(buffer), std::end(buffer));

EXPECT_EQ(std::string("Hello"), result);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this test? We should avoid adding more tests that rely too much on implementation details.

Comment on lines 620 to 629
const auto expected_size = fmt::sprintf(format_string, format_arg).size();
const auto expected_size = 5;

EXPECT_EQ((truncated_printf_context(
EXPECT_EQ(expected_size, fmt::sprintf(format_string, format_arg).size());

EXPECT_EQ(expected_size,
(truncated_printf_context(
fmt::detail::truncating_iterator<backit>(it, 0), format_string,
fmt::make_format_args<truncated_printf_context>(format_arg))
.format()
.count()),
expected_size);
.count()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks unrelated, please revert (possibly submitting a separate PR to improve tests).

@vitaut vitaut merged commit 986fa00 into fmtlib:master Nov 12, 2020
@vitaut
Copy link
Contributor

vitaut commented Nov 12, 2020

Thank you!

@rimathia
Copy link
Contributor Author

Thanks for your patience, I will try to get future pull requests to be more focused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants