Switch `CARBON_VLOG` to support a format string API. #4283

chandlerc · 2024-09-08T04:53:57Z

The goal is to replace our stream operator APIs with format string APIs that can be made to have much less impact on inlining and other optimizations of the performance critical path through the code.

Several experiments show that the most compact representation we can arrange for is one that calls an uninlined function and passes a minimal number of arguments to it. It doesn't help to do any work to minimize the arguments such as building a lambda -- the cost of extra code to merge the arguments is likely to outweigh the benefit.

Initial experiments showed that switching a hot but uninlined function to this new API enabled inlining and the subsequent performance improvement.

This also adds a 'TemplateString` utility that allows using a string literal as a template parameter. This is useful to remove the format string itself from the arguments passed to the function by passing it as a template argument instead.

Currently, support is left in place for both APIs because with CARBON_VLOG we can detect whether or not any message was provided expecting a format string. This should allow incrementally migrating code to this API. I've added some test coverage in this PR, but I'll separate out any switching of parts of the codebase over.

The goal is to eventually replace all the usages and remove the streaming support entirely.

This PR doesn't update CARBON_CHECK in the same way because it is substantially more complex to switch. I have a few experimental PRs looking at that and will discuss how best to approach this with the specific challenges check presents separately. But the goal is for all of the macro-based output APIs to move to format strings rather than streams.

The goal is to replace our stream operator APIs with format string APIs that can be made to have much less impact on inlining and other optimizations of the performance critical path through the code. Several experiments show that the most compact representation we can arrange for is one that calls an uninlined function and passes a minimal number of arguments to it. It doesn't help to do any work to minimize the arguments such as building a lambda -- the cost of extra code to merge the arguments is likely to outweigh the benefit. Initial experiments showed that switching a hot but uninlined function to this new API enabled inlining and the subsequent performance improvement. This also adds a 'TemplateString` utility that allows using a string literal as a template parameter. This is useful to remove the format string itself from the arguments passed to the function by passing it as a template argument instead. Currently, support is left in place for both APIs because with `CARBON_VLOG` we can detect whether or not any message was provided expecting a format string. This should allow incrementally migrating code to this API. I've added some test coverage in this PR, but I'll separate out any switching of parts of the codebase over. The goal is to eventually replace all the usages and remove the streaming support entirely. This PR doesn't update `CARBON_CHECK` in the same way because it is substantially more complex to switch. I have a few experimental PRs looking at that and will discuss how best to approach this with the specific challenges check presents separately. But the goal is for all of the macro-based output APIs to move to format strings rather than streams.

chandlerc · 2024-09-08T22:28:43Z

Can look at #4284 to see what the result of this looks like.

This switches `DCHECK` and `FATAL` as well. The goal is to reduce the code size impact of these assertions so that we can keep more of them enabled. Currently, the largest cost I see from `CHECK` is not the actual check or the cold code itself, but actually the failure to inline trivial functions due to the presence of the cold code. This means that our goal isn't to reduce apparent code size in the final binary but the LLVM IR cost assessed for these routines in the inliner, which closely correlates with code size but is a bit different. As discussed in carbon-language#4283, experimentation shows that a single function call with a minimal number of arguments is the lowest cost model for these. This is easily achieved with a format-string API that internally uses `llvm::formatv`. This PR is essentially the `CHECK` version of carbon-language#4283. However, the check macros are *substantially* harder to make work with both format strings and streaming because they also take a condition. Also, unexpectedly, I was very successful at devising a regular expression based automated rewrite from the streaming to the format string form with only low 10s of manual fixes. This includes compacting strings broken up across lines, etc. Given how well that went, I've prepared this PR which just directly switches to the format string API and migrate everything to use it. One nice side-effect is that the format string approach ends up greatly simplifying the implementation here as well.

This switches `DCHECK` and `FATAL` as well. The goal is to reduce the code size impact of these assertions so that we can keep more of them enabled. Currently, the largest cost I see from `CHECK` is not the actual check or the cold code itself, but actually the failure to inline trivial functions due to the presence of the cold code. This means that our goal isn't to reduce apparent code size in the final binary but the LLVM IR cost assessed for these routines in the inliner, which closely correlates with code size but is a bit different. As discussed in carbon-language#4283, experimentation shows that a single function call with a minimal number of arguments is the lowest cost model for these. This is easily achieved with a format-string API that internally uses `llvm::formatv`. This PR is essentially the `CHECK` version of carbon-language#4283. However, the check macros are *substantially* harder to make work with both format strings and streaming because they also take a condition. Also, unexpectedly, I was very successful at devising a regular expression based automated rewrite from the streaming to the format string form with only low 10s of manual fixes. This includes compacting strings broken up across lines, etc. Given how well that went, I've prepared this PR which just directly switches to the format string API and migrate everything to use it. One nice side-effect is that the format string approach ends up greatly simplifying the implementation here as well. This is ... *shockingly* effective. Parsing speeds up by more than 3% with just this change. And checking speeds up by **8%** with this change alone: ``` BM_CompileAPIFileDenseDecls<Phase::Parse>/256 86.3µs ± 1% 82.9µs ± 1% -3.94% (p=0.000 n=17+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/1024 431µs ± 1% 415µs ± 1% -3.76% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/4096 1.77ms ± 1% 1.71ms ± 1% -3.18% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/16384 7.44ms ± 1% 7.17ms ± 2% -3.56% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/65536 30.7ms ± 1% 29.7ms ± 1% -3.15% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/262144 131ms ± 1% 127ms ± 1% -2.81% (p=0.000 n=18+18) BM_CompileAPIFileDenseDecls<Phase::Check>/256 878µs ± 2% 800µs ± 1% -8.91% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/1024 1.88ms ± 2% 1.72ms ± 1% -8.56% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/4096 5.78ms ± 2% 5.28ms ± 1% -8.70% (p=0.000 n=20+18) BM_CompileAPIFileDenseDecls<Phase::Check>/16384 21.9ms ± 1% 20.1ms ± 1% -8.02% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Check>/65536 90.4ms ± 2% 83.1ms ± 1% -8.04% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/262144 381ms ± 2% 352ms ± 1% -7.79% (p=0.000 n=19+19) ```

common/template_string.h

josh11b · 2024-09-09T03:14:11Z

common/template_string.h

+// Represent a compile-time string in a form suitable for non-type template
+// arguments.
+//
+// These arguments are required to be "structural", and so we copy the string


Until I looked up "structural" and found https://en.cppreference.com/w/cpp/language/template_parameters I was pretty confused by this.

Added a link to the specific subsection of that article.

common/template_string.h

common/template_string_test.cpp

common/vlog_internal.h

This switches `DCHECK` and `FATAL` as well. The goal is to reduce the code size impact of these assertions so that we can keep more of them enabled. Currently, the largest cost I see from `CHECK` is not the actual check or the cold code itself, but actually the failure to inline trivial functions due to the presence of the cold code. This means that our goal isn't to reduce apparent code size in the final binary but the LLVM IR cost assessed for these routines in the inliner, which closely correlates with code size but is a bit different. As discussed in carbon-language#4283, experimentation shows that a single function call with a minimal number of arguments is the lowest cost model for these. This is easily achieved with a format-string API that internally uses `llvm::formatv`. This PR is essentially the `CHECK` version of carbon-language#4283. However, the check macros are *substantially* harder to make work with both format strings and streaming because they also take a condition. Also, unexpectedly, I was very successful at devising a regular expression based automated rewrite from the streaming to the format string form with only low 10s of manual fixes. This includes compacting strings broken up across lines, etc. Given how well that went, I've prepared this PR which just directly switches to the format string API and migrate everything to use it. One nice side-effect is that the format string approach ends up greatly simplifying the implementation here as well. This is ... *shockingly* effective. Parsing speeds up by more than 3% with just this change. And checking speeds up by **8%** with this change alone: ``` BM_CompileAPIFileDenseDecls<Phase::Parse>/256 86.3µs ± 1% 82.9µs ± 1% -3.94% (p=0.000 n=17+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/1024 431µs ± 1% 415µs ± 1% -3.76% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/4096 1.77ms ± 1% 1.71ms ± 1% -3.18% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/16384 7.44ms ± 1% 7.17ms ± 2% -3.56% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/65536 30.7ms ± 1% 29.7ms ± 1% -3.15% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/262144 131ms ± 1% 127ms ± 1% -2.81% (p=0.000 n=18+18) BM_CompileAPIFileDenseDecls<Phase::Check>/256 878µs ± 2% 800µs ± 1% -8.91% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/1024 1.88ms ± 2% 1.72ms ± 1% -8.56% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/4096 5.78ms ± 2% 5.28ms ± 1% -8.70% (p=0.000 n=20+18) BM_CompileAPIFileDenseDecls<Phase::Check>/16384 21.9ms ± 1% 20.1ms ± 1% -8.02% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Check>/65536 90.4ms ± 2% 83.1ms ± 1% -8.04% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/262144 381ms ± 2% 352ms ± 1% -7.79% (p=0.000 n=19+19) ```

Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>

chandlerc

Thanks, PTAL! I've also incorporated some of what I learned working on the check version.

chandlerc · 2024-09-09T17:07:13Z

common/template_string.h

+// Represent a compile-time string in a form suitable for non-type template
+// arguments.
+//
+// These arguments are required to be "structural", and so we copy the string


Added a link to the specific subsection of that article.

common/template_string.h

zygoloid · 2024-09-09T20:52:57Z

common/template_string.h

+// Designed to support implicitly deduced construction from a string literal
+// template argument. This type will implicitly convert to an `llvm::StringRef`
+// for accessing the string contents, and also provides a dedicated `c_str()`
+// method to access the string as a C-string.


While I'm picking trivial nits, I would spell this as "C string" not "C-string".

common/template_string.h

common/template_string_test.cpp

Co-authored-by: Richard Smith <richard@metafoo.co.uk>

common/template_string_test.cpp

chandlerc

Applied all the fixes, including the marvelous compile-time testing for validity. Did some cleanup and commenting there to try and make sure its clear what we're doing.

Merging, but please flag anything I should follow up on!

chandlerc · 2024-09-11T09:37:58Z

common/template_string.h

+// Designed to support implicitly deduced construction from a string literal
+// template argument. This type will implicitly convert to an `llvm::StringRef`
+// for accessing the string contents, and also provides a dedicated `c_str()`
+// method to access the string as a C-string.


This switches `DCHECK` and `FATAL` as well. The goal is to reduce the code size impact of these assertions so that we can keep more of them enabled. Currently, the largest cost I see from `CHECK` is not the actual check or the cold code itself, but actually the failure to inline trivial functions due to the presence of the cold code. This means that our goal isn't to reduce apparent code size in the final binary but the LLVM IR cost assessed for these routines in the inliner, which closely correlates with code size but is a bit different. As discussed in carbon-language#4283, experimentation shows that a single function call with a minimal number of arguments is the lowest cost model for these. This is easily achieved with a format-string API that internally uses `llvm::formatv`. This PR is essentially the `CHECK` version of carbon-language#4283. However, the check macros are *substantially* harder to make work with both format strings and streaming because they also take a condition. Also, unexpectedly, I was very successful at devising a regular expression based automated rewrite from the streaming to the format string form with only low 10s of manual fixes. This includes compacting strings broken up across lines, etc. Given how well that went, I've prepared this PR which just directly switches to the format string API and migrate everything to use it. One nice side-effect is that the format string approach ends up greatly simplifying the implementation here as well. This is ... *shockingly* effective. Parsing speeds up by more than 3% with just this change. And checking speeds up by **8%** with this change alone: ``` BM_CompileAPIFileDenseDecls<Phase::Parse>/256 86.3µs ± 1% 82.9µs ± 1% -3.94% (p=0.000 n=17+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/1024 431µs ± 1% 415µs ± 1% -3.76% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/4096 1.77ms ± 1% 1.71ms ± 1% -3.18% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/16384 7.44ms ± 1% 7.17ms ± 2% -3.56% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/65536 30.7ms ± 1% 29.7ms ± 1% -3.15% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/262144 131ms ± 1% 127ms ± 1% -2.81% (p=0.000 n=18+18) BM_CompileAPIFileDenseDecls<Phase::Check>/256 878µs ± 2% 800µs ± 1% -8.91% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/1024 1.88ms ± 2% 1.72ms ± 1% -8.56% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/4096 5.78ms ± 2% 5.28ms ± 1% -8.70% (p=0.000 n=20+18) BM_CompileAPIFileDenseDecls<Phase::Check>/16384 21.9ms ± 1% 20.1ms ± 1% -8.02% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Check>/65536 90.4ms ± 2% 83.1ms ± 1% -8.04% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/262144 381ms ± 2% 352ms ± 1% -7.79% (p=0.000 n=19+19) ```

This switches `DCHECK` and `FATAL` as well. The goal is to reduce the code size impact of these assertions so that we can keep more of them enabled. Currently, the largest cost I see from `CHECK` is not the actual check or the cold code itself, but actually the failure to inline trivial functions due to the presence of the cold code. This means that our goal isn't to reduce apparent code size in the final binary but the LLVM IR cost assessed for these routines in the inliner, which closely correlates with code size but is a bit different. As discussed in #4283, experimentation shows that a single function call with a minimal number of arguments is the lowest cost model for these. This is easily achieved with a format-string API that internally uses `llvm::formatv`. This PR is essentially the `CHECK` version of #4283. However, the check macros are substantially harder to make work with both format strings and streaming because they also take a condition. Also, unexpectedly, I was very successful at devising a regular expression based automated rewrite from the streaming to the format string form with only low 10s of manual fixes. This includes compacting strings broken up across lines, etc. Given how well that went, I've prepared this PR which just directly switches to the format string API and migrate everything to use it. One nice side-effect is that the format string approach ends up greatly simplifying the implementation here as well. This is ... *shockingly* effective. Parsing speeds up by more than 3% with just this change. And checking speeds up by **8%** with this change alone: ``` BM_CompileAPIFileDenseDecls<Phase::Parse>/256 86.3µs ± 1% 82.9µs ± 1% -3.94% (p=0.000 n=17+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/1024 431µs ± 1% 415µs ± 1% -3.76% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/4096 1.77ms ± 1% 1.71ms ± 1% -3.18% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/16384 7.44ms ± 1% 7.17ms ± 2% -3.56% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/65536 30.7ms ± 1% 29.7ms ± 1% -3.15% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/262144 131ms ± 1% 127ms ± 1% -2.81% (p=0.000 n=18+18) BM_CompileAPIFileDenseDecls<Phase::Check>/256 878µs ± 2% 800µs ± 1% -8.91% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/1024 1.88ms ± 2% 1.72ms ± 1% -8.56% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/4096 5.78ms ± 2% 5.28ms ± 1% -8.70% (p=0.000 n=20+18) BM_CompileAPIFileDenseDecls<Phase::Check>/16384 21.9ms ± 1% 20.1ms ± 1% -8.02% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Check>/65536 90.4ms ± 2% 83.1ms ± 1% -8.04% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/262144 381ms ± 2% 352ms ± 1% -7.79% (p=0.000 n=19+19) ``` --------- Co-authored-by: Richard Smith <richard@metafoo.co.uk> Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>

github-actions bot added the toolchain label Sep 8, 2024

github-actions bot requested a review from zygoloid September 8, 2024 04:54

chandlerc mentioned this pull request Sep 8, 2024

Migrate all CARBON_VLOG to the format string variant. #4284

Merged

chandlerc mentioned this pull request Sep 9, 2024

Switch CARBON_CHECK to a format string API #4285

Merged

josh11b reviewed Sep 9, 2024

View reviewed changes

chandlerc and others added 4 commits September 9, 2024 10:07

Apply suggestions from code review

2ff364c

Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>

review

aab411a

format

59a13b8

add cold attribute as i ended up using on check

6fd5168

chandlerc commented Sep 9, 2024

View reviewed changes

chandlerc requested a review from josh11b September 9, 2024 19:08

josh11b approved these changes Sep 9, 2024

View reviewed changes

zygoloid approved these changes Sep 9, 2024

View reviewed changes

chandlerc and others added 3 commits September 11, 2024 02:29

Apply suggestions from code review

b552b26

Co-authored-by: Richard Smith <richard@metafoo.co.uk>

Update common/template_string_test.cpp

d04daec

Co-authored-by: Richard Smith <richard@metafoo.co.uk>

fix more C-string

f93b3b2

CarbonInfraBot reviewed Sep 11, 2024

View reviewed changes

common/template_string_test.cpp Outdated Show resolved Hide resolved

common/template_string_test.cpp Outdated Show resolved Hide resolved

chandlerc added 2 commits September 11, 2024 12:04

clean up and comment

91e1e8e

fixes

b79e8ba

chandlerc commented Sep 11, 2024

View reviewed changes

chandlerc enabled auto-merge September 11, 2024 10:08

chandlerc added this pull request to the merge queue Sep 11, 2024

Merged via the queue into carbon-language:trunk with commit e48101b Sep 11, 2024
8 checks passed

chandlerc deleted the vlog-shrink branch September 11, 2024 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch `CARBON_VLOG` to support a format string API. #4283

Switch `CARBON_VLOG` to support a format string API. #4283

chandlerc commented Sep 8, 2024

chandlerc commented Sep 8, 2024 •

edited

Loading

josh11b Sep 9, 2024

chandlerc Sep 9, 2024

chandlerc left a comment

chandlerc Sep 9, 2024

zygoloid Sep 9, 2024

chandlerc Sep 11, 2024

chandlerc left a comment

chandlerc Sep 11, 2024

Switch CARBON_VLOG to support a format string API. #4283

Switch CARBON_VLOG to support a format string API. #4283

Conversation

chandlerc commented Sep 8, 2024

chandlerc commented Sep 8, 2024 • edited Loading

josh11b Sep 9, 2024

Choose a reason for hiding this comment

chandlerc Sep 9, 2024

Choose a reason for hiding this comment

chandlerc left a comment

Choose a reason for hiding this comment

chandlerc Sep 9, 2024

Choose a reason for hiding this comment

zygoloid Sep 9, 2024

Choose a reason for hiding this comment

chandlerc Sep 11, 2024

Choose a reason for hiding this comment

chandlerc left a comment

Choose a reason for hiding this comment

chandlerc Sep 11, 2024

Choose a reason for hiding this comment

Switch `CARBON_VLOG` to support a format string API. #4283

Switch `CARBON_VLOG` to support a format string API. #4283

chandlerc commented Sep 8, 2024 •

edited

Loading