Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch CARBON_VLOG to support a format string API. #4283

Merged
merged 10 commits into from
Sep 11, 2024

Conversation

chandlerc
Copy link
Contributor

The goal is to replace our stream operator APIs with format string APIs that can be made to have much less impact on inlining and other optimizations of the performance critical path through the code.

Several experiments show that the most compact representation we can arrange for is one that calls an uninlined function and passes a minimal number of arguments to it. It doesn't help to do any work to minimize the arguments such as building a lambda -- the cost of extra code to merge the arguments is likely to outweigh the benefit.

Initial experiments showed that switching a hot but uninlined function to this new API enabled inlining and the subsequent performance improvement.

This also adds a 'TemplateString` utility that allows using a string literal as a template parameter. This is useful to remove the format string itself from the arguments passed to the function by passing it as a template argument instead.

Currently, support is left in place for both APIs because with CARBON_VLOG we can detect whether or not any message was provided expecting a format string. This should allow incrementally migrating code to this API. I've added some test coverage in this PR, but I'll separate out any switching of parts of the codebase over.

The goal is to eventually replace all the usages and remove the streaming support entirely.

This PR doesn't update CARBON_CHECK in the same way because it is substantially more complex to switch. I have a few experimental PRs looking at that and will discuss how best to approach this with the specific challenges check presents separately. But the goal is for all of the macro-based output APIs to move to format strings rather than streams.

The goal is to replace our stream operator APIs with format string APIs
that can be made to have much less impact on inlining and other
optimizations of the performance critical path through the code.

Several experiments show that the most compact representation we can
arrange for is one that calls an uninlined function and passes a minimal
number of arguments to it. It doesn't help to do any work to minimize
the arguments such as building a lambda -- the cost of extra code to
merge the arguments is likely to outweigh the benefit.

Initial experiments showed that switching a hot but uninlined function
to this new API enabled inlining and the subsequent performance
improvement.

This also adds a 'TemplateString` utility that allows using a string
literal as a template parameter. This is useful to remove the format
string itself from the arguments passed to the function by passing it as
a template argument instead.

Currently, support is left in place for both APIs because with
`CARBON_VLOG` we can detect whether or not any message was provided
expecting a format string. This should allow incrementally migrating
code to this API. I've added some test coverage in this PR, but I'll
separate out any switching of parts of the codebase over.

The goal is to eventually replace all the usages and remove the
streaming support entirely.

This PR doesn't update `CARBON_CHECK` in the same way because it is
substantially more complex to switch. I have a few experimental PRs
looking at that and will discuss how best to approach this with the
specific challenges check presents separately. But the goal is for all
of the macro-based output APIs to move to format strings rather than
streams.
@chandlerc
Copy link
Contributor Author

chandlerc commented Sep 8, 2024

Can look at #4284 to see what the result of this looks like.

chandlerc added a commit to chandlerc/carbon-lang that referenced this pull request Sep 9, 2024
This switches `DCHECK` and `FATAL` as well.

The goal is to reduce the code size impact of these assertions so that
we can keep more of them enabled. Currently, the largest cost I see from
`CHECK` is not the actual check or the cold code itself, but actually
the failure to inline trivial functions due to the presence of the cold
code. This means that our goal isn't to reduce apparent code size in the
final binary but the LLVM IR cost assessed for these routines in the
inliner, which closely correlates with code size but is a bit different.

As discussed in carbon-language#4283, experimentation shows that a single function call
with a minimal number of arguments is the lowest cost model for these.
This is easily achieved with a format-string API that internally uses
`llvm::formatv`. This PR is essentially the `CHECK` version of carbon-language#4283.

However, the check macros are *substantially* harder to make work with
both format strings and streaming because they also take a condition.
Also, unexpectedly, I was very successful at devising a regular
expression based automated rewrite from the streaming to the format
string form with only low 10s of manual fixes. This includes compacting
strings broken up across lines, etc. Given how well that went, I've
prepared this PR which just directly switches to the format string API
and migrate everything to use it.

One nice side-effect is that the format string approach ends up greatly
simplifying the implementation here as well.
chandlerc added a commit to chandlerc/carbon-lang that referenced this pull request Sep 9, 2024
This switches `DCHECK` and `FATAL` as well.

The goal is to reduce the code size impact of these assertions so that
we can keep more of them enabled. Currently, the largest cost I see from
`CHECK` is not the actual check or the cold code itself, but actually
the failure to inline trivial functions due to the presence of the cold
code. This means that our goal isn't to reduce apparent code size in the
final binary but the LLVM IR cost assessed for these routines in the
inliner, which closely correlates with code size but is a bit different.

As discussed in carbon-language#4283, experimentation shows that a single function call
with a minimal number of arguments is the lowest cost model for these.
This is easily achieved with a format-string API that internally uses
`llvm::formatv`. This PR is essentially the `CHECK` version of carbon-language#4283.

However, the check macros are *substantially* harder to make work with
both format strings and streaming because they also take a condition.
Also, unexpectedly, I was very successful at devising a regular
expression based automated rewrite from the streaming to the format
string form with only low 10s of manual fixes. This includes compacting
strings broken up across lines, etc. Given how well that went, I've
prepared this PR which just directly switches to the format string API
and migrate everything to use it.

One nice side-effect is that the format string approach ends up greatly
simplifying the implementation here as well.

This is ... *shockingly* effective. Parsing speeds up by more than 3%
with just this change. And checking speeds up by **8%** with this change
alone:
```
BM_CompileAPIFileDenseDecls<Phase::Parse>/256      86.3µs ± 1%  82.9µs ± 1%  -3.94%  (p=0.000 n=17+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/1024      431µs ± 1%   415µs ± 1%  -3.76%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/4096     1.77ms ± 1%  1.71ms ± 1%  -3.18%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/16384    7.44ms ± 1%  7.17ms ± 2%  -3.56%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/65536    30.7ms ± 1%  29.7ms ± 1%  -3.15%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/262144    131ms ± 1%   127ms ± 1%  -2.81%  (p=0.000 n=18+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/256       878µs ± 2%   800µs ± 1%  -8.91%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/1024     1.88ms ± 2%  1.72ms ± 1%  -8.56%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/4096     5.78ms ± 2%  5.28ms ± 1%  -8.70%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    21.9ms ± 1%  20.1ms ± 1%  -8.02%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    90.4ms ± 2%  83.1ms ± 1%  -8.04%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/262144    381ms ± 2%   352ms ± 1%  -7.79%  (p=0.000 n=19+19)
```
common/template_string.h Outdated Show resolved Hide resolved
// Represent a compile-time string in a form suitable for non-type template
// arguments.
//
// These arguments are required to be "structural", and so we copy the string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until I looked up "structural" and found https://en.cppreference.com/w/cpp/language/template_parameters I was pretty confused by this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a link to the specific subsection of that article.

common/template_string.h Outdated Show resolved Hide resolved
common/template_string_test.cpp Outdated Show resolved Hide resolved
common/vlog_internal.h Outdated Show resolved Hide resolved
chandlerc added a commit to chandlerc/carbon-lang that referenced this pull request Sep 9, 2024
This switches `DCHECK` and `FATAL` as well.

The goal is to reduce the code size impact of these assertions so that
we can keep more of them enabled. Currently, the largest cost I see from
`CHECK` is not the actual check or the cold code itself, but actually
the failure to inline trivial functions due to the presence of the cold
code. This means that our goal isn't to reduce apparent code size in the
final binary but the LLVM IR cost assessed for these routines in the
inliner, which closely correlates with code size but is a bit different.

As discussed in carbon-language#4283, experimentation shows that a single function call
with a minimal number of arguments is the lowest cost model for these.
This is easily achieved with a format-string API that internally uses
`llvm::formatv`. This PR is essentially the `CHECK` version of carbon-language#4283.

However, the check macros are *substantially* harder to make work with
both format strings and streaming because they also take a condition.
Also, unexpectedly, I was very successful at devising a regular
expression based automated rewrite from the streaming to the format
string form with only low 10s of manual fixes. This includes compacting
strings broken up across lines, etc. Given how well that went, I've
prepared this PR which just directly switches to the format string API
and migrate everything to use it.

One nice side-effect is that the format string approach ends up greatly
simplifying the implementation here as well.

This is ... *shockingly* effective. Parsing speeds up by more than 3%
with just this change. And checking speeds up by **8%** with this change
alone:
```
BM_CompileAPIFileDenseDecls<Phase::Parse>/256      86.3µs ± 1%  82.9µs ± 1%  -3.94%  (p=0.000 n=17+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/1024      431µs ± 1%   415µs ± 1%  -3.76%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/4096     1.77ms ± 1%  1.71ms ± 1%  -3.18%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/16384    7.44ms ± 1%  7.17ms ± 2%  -3.56%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/65536    30.7ms ± 1%  29.7ms ± 1%  -3.15%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/262144    131ms ± 1%   127ms ± 1%  -2.81%  (p=0.000 n=18+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/256       878µs ± 2%   800µs ± 1%  -8.91%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/1024     1.88ms ± 2%  1.72ms ± 1%  -8.56%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/4096     5.78ms ± 2%  5.28ms ± 1%  -8.70%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    21.9ms ± 1%  20.1ms ± 1%  -8.02%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    90.4ms ± 2%  83.1ms ± 1%  -8.04%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/262144    381ms ± 2%   352ms ± 1%  -7.79%  (p=0.000 n=19+19)
```
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, PTAL! I've also incorporated some of what I learned working on the check version.

// Represent a compile-time string in a form suitable for non-type template
// arguments.
//
// These arguments are required to be "structural", and so we copy the string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a link to the specific subsection of that article.

common/template_string.h Outdated Show resolved Hide resolved
common/template_string.h Outdated Show resolved Hide resolved
common/template_string.h Outdated Show resolved Hide resolved
// Designed to support implicitly deduced construction from a string literal
// template argument. This type will implicitly convert to an `llvm::StringRef`
// for accessing the string contents, and also provides a dedicated `c_str()`
// method to access the string as a C-string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm picking trivial nits, I would spell this as "C string" not "C-string".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

common/template_string.h Outdated Show resolved Hide resolved
common/template_string_test.cpp Outdated Show resolved Hide resolved
chandlerc and others added 3 commits September 11, 2024 02:29
Co-authored-by: Richard Smith <richard@metafoo.co.uk>
Co-authored-by: Richard Smith <richard@metafoo.co.uk>
common/template_string_test.cpp Outdated Show resolved Hide resolved
common/template_string_test.cpp Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied all the fixes, including the marvelous compile-time testing for validity. Did some cleanup and commenting there to try and make sure its clear what we're doing.

Merging, but please flag anything I should follow up on!

// Designed to support implicitly deduced construction from a string literal
// template argument. This type will implicitly convert to an `llvm::StringRef`
// for accessing the string contents, and also provides a dedicated `c_str()`
// method to access the string as a C-string.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@chandlerc chandlerc added this pull request to the merge queue Sep 11, 2024
Merged via the queue into carbon-language:trunk with commit e48101b Sep 11, 2024
8 checks passed
@chandlerc chandlerc deleted the vlog-shrink branch September 11, 2024 11:10
chandlerc added a commit to chandlerc/carbon-lang that referenced this pull request Sep 11, 2024
This switches `DCHECK` and `FATAL` as well.

The goal is to reduce the code size impact of these assertions so that
we can keep more of them enabled. Currently, the largest cost I see from
`CHECK` is not the actual check or the cold code itself, but actually
the failure to inline trivial functions due to the presence of the cold
code. This means that our goal isn't to reduce apparent code size in the
final binary but the LLVM IR cost assessed for these routines in the
inliner, which closely correlates with code size but is a bit different.

As discussed in carbon-language#4283, experimentation shows that a single function call
with a minimal number of arguments is the lowest cost model for these.
This is easily achieved with a format-string API that internally uses
`llvm::formatv`. This PR is essentially the `CHECK` version of carbon-language#4283.

However, the check macros are *substantially* harder to make work with
both format strings and streaming because they also take a condition.
Also, unexpectedly, I was very successful at devising a regular
expression based automated rewrite from the streaming to the format
string form with only low 10s of manual fixes. This includes compacting
strings broken up across lines, etc. Given how well that went, I've
prepared this PR which just directly switches to the format string API
and migrate everything to use it.

One nice side-effect is that the format string approach ends up greatly
simplifying the implementation here as well.

This is ... *shockingly* effective. Parsing speeds up by more than 3%
with just this change. And checking speeds up by **8%** with this change
alone:
```
BM_CompileAPIFileDenseDecls<Phase::Parse>/256      86.3µs ± 1%  82.9µs ± 1%  -3.94%  (p=0.000 n=17+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/1024      431µs ± 1%   415µs ± 1%  -3.76%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/4096     1.77ms ± 1%  1.71ms ± 1%  -3.18%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/16384    7.44ms ± 1%  7.17ms ± 2%  -3.56%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/65536    30.7ms ± 1%  29.7ms ± 1%  -3.15%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/262144    131ms ± 1%   127ms ± 1%  -2.81%  (p=0.000 n=18+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/256       878µs ± 2%   800µs ± 1%  -8.91%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/1024     1.88ms ± 2%  1.72ms ± 1%  -8.56%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/4096     5.78ms ± 2%  5.28ms ± 1%  -8.70%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    21.9ms ± 1%  20.1ms ± 1%  -8.02%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    90.4ms ± 2%  83.1ms ± 1%  -8.04%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/262144    381ms ± 2%   352ms ± 1%  -7.79%  (p=0.000 n=19+19)
```
chandlerc added a commit to chandlerc/carbon-lang that referenced this pull request Sep 11, 2024
This switches `DCHECK` and `FATAL` as well.

The goal is to reduce the code size impact of these assertions so that
we can keep more of them enabled. Currently, the largest cost I see from
`CHECK` is not the actual check or the cold code itself, but actually
the failure to inline trivial functions due to the presence of the cold
code. This means that our goal isn't to reduce apparent code size in the
final binary but the LLVM IR cost assessed for these routines in the
inliner, which closely correlates with code size but is a bit different.

As discussed in carbon-language#4283, experimentation shows that a single function call
with a minimal number of arguments is the lowest cost model for these.
This is easily achieved with a format-string API that internally uses
`llvm::formatv`. This PR is essentially the `CHECK` version of carbon-language#4283.

However, the check macros are *substantially* harder to make work with
both format strings and streaming because they also take a condition.
Also, unexpectedly, I was very successful at devising a regular
expression based automated rewrite from the streaming to the format
string form with only low 10s of manual fixes. This includes compacting
strings broken up across lines, etc. Given how well that went, I've
prepared this PR which just directly switches to the format string API
and migrate everything to use it.

One nice side-effect is that the format string approach ends up greatly
simplifying the implementation here as well.

This is ... *shockingly* effective. Parsing speeds up by more than 3%
with just this change. And checking speeds up by **8%** with this change
alone:
```
BM_CompileAPIFileDenseDecls<Phase::Parse>/256      86.3µs ± 1%  82.9µs ± 1%  -3.94%  (p=0.000 n=17+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/1024      431µs ± 1%   415µs ± 1%  -3.76%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/4096     1.77ms ± 1%  1.71ms ± 1%  -3.18%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/16384    7.44ms ± 1%  7.17ms ± 2%  -3.56%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/65536    30.7ms ± 1%  29.7ms ± 1%  -3.15%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/262144    131ms ± 1%   127ms ± 1%  -2.81%  (p=0.000 n=18+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/256       878µs ± 2%   800µs ± 1%  -8.91%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/1024     1.88ms ± 2%  1.72ms ± 1%  -8.56%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/4096     5.78ms ± 2%  5.28ms ± 1%  -8.70%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    21.9ms ± 1%  20.1ms ± 1%  -8.02%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    90.4ms ± 2%  83.1ms ± 1%  -8.04%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/262144    381ms ± 2%   352ms ± 1%  -7.79%  (p=0.000 n=19+19)
```
github-merge-queue bot pushed a commit that referenced this pull request Sep 12, 2024
This switches `DCHECK` and `FATAL` as well.

The goal is to reduce the code size impact of these assertions so that
we can keep more of them enabled. Currently, the largest cost I see from
`CHECK` is not the actual check or the cold code itself, but actually
the failure to inline trivial functions due to the presence of the cold
code. This means that our goal isn't to reduce apparent code size in the
final binary but the LLVM IR cost assessed for these routines in the
inliner, which closely correlates with code size but is a bit different.

As discussed in #4283, experimentation shows that a single function call
with a minimal number of arguments is the lowest cost model for these.
This is easily achieved with a format-string API that internally uses
`llvm::formatv`. This PR is essentially the `CHECK` version of #4283.

However, the check macros are substantially harder to make work with
both format strings and streaming because they also take a condition.
Also, unexpectedly, I was very successful at devising a regular
expression based automated rewrite from the streaming to the format
string form with only low 10s of manual fixes. This includes compacting
strings broken up across lines, etc. Given how well that went, I've
prepared this PR which just directly switches to the format string API
and migrate everything to use it.

One nice side-effect is that the format string approach ends up greatly
simplifying the implementation here as well.

This is ... *shockingly* effective. Parsing speeds up by more than 3%
with just this change. And checking speeds up by **8%** with this change
alone:
```
BM_CompileAPIFileDenseDecls<Phase::Parse>/256      86.3µs ± 1%  82.9µs ± 1%  -3.94%  (p=0.000 n=17+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/1024      431µs ± 1%   415µs ± 1%  -3.76%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/4096     1.77ms ± 1%  1.71ms ± 1%  -3.18%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/16384    7.44ms ± 1%  7.17ms ± 2%  -3.56%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/65536    30.7ms ± 1%  29.7ms ± 1%  -3.15%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/262144    131ms ± 1%   127ms ± 1%  -2.81%  (p=0.000 n=18+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/256       878µs ± 2%   800µs ± 1%  -8.91%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/1024     1.88ms ± 2%  1.72ms ± 1%  -8.56%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/4096     5.78ms ± 2%  5.28ms ± 1%  -8.70%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    21.9ms ± 1%  20.1ms ± 1%  -8.02%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    90.4ms ± 2%  83.1ms ± 1%  -8.04%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/262144    381ms ± 2%   352ms ± 1%  -7.79%  (p=0.000 n=19+19)
```

---------

Co-authored-by: Richard Smith <richard@metafoo.co.uk>
Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants