Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InlineCost] Enable the cost benefit analysis for Sample PGO #66457

Merged
merged 2 commits into from
Sep 21, 2023

Conversation

HaohaiWen
Copy link
Contributor

Enables the cost-benefit-analysis-based inliner by default if we have sample profile.

Enables the cost-benefit-analysis-based inliner by default if we have
sample profile.
@llvmbot
Copy link
Member

llvmbot commented Sep 15, 2023

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Changes Enables the cost-benefit-analysis-based inliner by default if we have sample profile. -- Full diff: https://github.com//pull/66457.diff

1 Files Affected:

  • (modified) llvm/lib/Analysis/InlineCost.cpp (+1-1)
diff --git a/llvm/lib/Analysis/InlineCost.cpp b/llvm/lib/Analysis/InlineCost.cpp
index 84dc412a3ab6e18..12c7cb62e799f29 100644
--- a/llvm/lib/Analysis/InlineCost.cpp
+++ b/llvm/lib/Analysis/InlineCost.cpp
@@ -787,7 +787,7 @@ class InlineCostCallAnalyzer final : public CallAnalyzer {
         return false;
     } else {
       // Otherwise, require instrumentation profile.
-      if (!PSI->hasInstrumentationProfile())
+      if (!(PSI->hasInstrumentationProfile() || PSI->hasSampleProfile()))
         return false;
     }
 

@kazutakahirata
Copy link
Contributor

Thank you for the patch. Do you have any details you can share? Performance numbers?

With sample profiling, a lot of inlining is done in the sample profile loader. Do you you see meaningful inlining opportunities left for llvm/lib/Transforms/IPO/Inliner.cpp for large applications, such as clang?

@HaohaiWen
Copy link
Contributor Author

Thank you for the patch. Do you have any details you can share? Performance numbers?

With sample profiling, a lot of inlining is done in the sample profile loader. Do you you see meaningful inlining opportunities left for llvm/lib/Transforms/IPO/Inliner.cpp for large applications, such as clang?

caller:
  1: 10
  3: 500
  5: callee : 500
  10: 10

Given this simple example sample profile file, callee was not inlined when we collected profile file.
If I recall correctly, Sample Loader pass only inline a function if it was inlined when collecting profile file. In this case, callee will not be inlined by Sample Loader pass.
The InlinerPass will then try to analyze inline cost for callee. If it is IPGO, cost benefit analysis is done, and the profile result is used to determine whether callee should be inlined. Without this patch, for SPGO, InlinerPass only uses its internal heuristic to analyze cost. Inline result for callee should be same as the version before doing SPGO. This is what we found on our internal experiment.

@HaohaiWen
Copy link
Contributor Author

I'll merge it tomorrow if no objection.

@HaohaiWen HaohaiWen merged commit 2f2319c into llvm:main Sep 21, 2023
@HaohaiWen HaohaiWen deleted the inlinecost branch September 21, 2023 02:17
@HaohaiWen
Copy link
Contributor Author

I can't reproduce it on my machine. Let's revert it first.

Check file: /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/SampleProfile/pseudo-probe-emit.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
           20:  [15] .pseudo_probe_desc PROGBITS 0000000000000000 0000ad 000015 00 G 0 0 1 
           21:  [16] .note.GNU-stack PROGBITS 0000000000000000 0000c2 000000 00 0 0 1 
           22:  [17] .eh_frame X86_64_UNWIND 0000000000000000 0000c8 000058 00 A 0 0 8 
           23:  [18] .rela.eh_frame RELA 0000000000000000 000300 000048 18 I 22 17 8 
           24:  [19] .pseudo_probe PROGBITS 0000000000000000 000120 000013 00 L 5 0 1 
           25:  [20] .pseudo_probe PROGBITS 0000000000000000 000133 00000d 00 LG 8 0 1 
next:141'0                                                                            X error: no match found
           26:  [21] .pseudo_probe PROGBITS 0000000000000000 000140 000019 00 L 3 0 1 
next:141'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:141'1         ?                                                                   possible intended match
           27:  [22] .symtab SYMTAB 0000000000000000 000180 000138 18 1 9 8 
next:141'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           28: Key to Flags: 
next:141'0     ~~~~~~~~~~~~~~
           29:  W (write), A (alloc), X (execute), M (merge), S (strings), I (info), 
next:141'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           30:  L (link order), O (extra OS processing required), G (group), T (TLS), 
next:141'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           31:  C (compressed), x (unknown), o (OS specific), E (exclude), 
next:141'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           32:  R (retain), l (large), p (processor specific) 
next:141'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

********************

HaohaiWen added a commit that referenced this pull request Sep 21, 2023
@HaohaiWen
Copy link
Contributor Author

The order of section changes and this PR is not the guilty commit. I'll reland it later.
This failure has been be fixed by @MaskRay in:
a18ee8b

HaohaiWen added a commit that referenced this pull request Sep 21, 2023
…66457)

Enables the cost-benefit-analysis-based inliner by default if we have
sample profile.

No extra fix is required.
@alexfh
Copy link
Contributor

alexfh commented Oct 4, 2023

We see a ~1.5% increase in a binary size after this patch. This can push some of our large binaries over certain limits. While I appreciate the potential performance improvements, I also wonder, if we have a way to revert to the old behavior should this be necessary in some cases?

@HaohaiWen
Copy link
Contributor Author

cost-benefit-analysis should be on for SPGO. Otherwise, the profile file is meaningless for InlinerPass.

@helloguo
Copy link
Contributor

helloguo commented Mar 6, 2024

Do you see any perf win on benchmarks and/or real applications? If so, can you share more details? We see some perf regression on our workloads.

@HaohaiWen
Copy link
Contributor Author

We've seen improvement for our internal workloads.
This patch just let InlineCost use profile data. I don't think its the root cause of regression.

@helloguo
Copy link
Contributor

helloguo commented Mar 6, 2024

This patch just let InlineCost use profile data

I don't fully understand this statement. IIUC, this patch changes how the inlining decisions are made (e.g. use cost-benefit instead of cost-threshold). The profile data is used both in cost-benefit and cost-threshold.

Can you share more about the numbers on your internal workloads? such as how much win do you see? and what workloads do you test?

@helloguo
Copy link
Contributor

helloguo commented Mar 6, 2024

When we test cost-benefit on our internal workloads, we see 0.4-0.5% regressions on several workloads (e.g. ads services). Even after we tune the related flags (e.g.-inline-savings-multiplier), we still see some gap between cost-benefit vs cost-threshold. While cost-benefit seems reasonable, the effectiveness also depends on other factors such as profile quality.

@HaohaiWen
Copy link
Contributor Author

It's long time ago. I think its about ~1% gain.
If I recall correctly, without this patch, some hot functions can be inlined by IPGO due to cost-benefit but not inlined for SPGO.
Will cost-threshold use PGO profile data as hint to inline?
Is there any reason why SPGO can't use cost-benefit analysis?

@helloguo
Copy link
Contributor

helloguo commented Mar 6, 2024

Will cost-threshold use PGO profile data as hint to inline?

Yes, profile data is used to update the threshold, which can impact the inlining decision.

Is there any reason why SPGO can't use cost-benefit analysis?

It's nice that inliners have different options (e.g. all the inlining flags, and cost-benefit v.s. cost-threshold) to try out. However, the impact of one option depends on many factors. In this particular case, those factors may include profile quality, and implementation details such as inline-savings-multiplier. Inliners can use cost-benefit or any other options when sample pgo is used. But enabling it by default is another story. Given this change will impact everyone who is using sample pgo, it would be nice to have a good amount of testing on benchmarks and applications before enabling it by default.

@WenleiHe
Copy link
Member

WenleiHe commented Mar 7, 2024

Is there any reason why SPGO can't use cost-benefit analysis?

@HaohaiWen Cost-benefit analysis generally is a better heuristic. but not turning it on by default for sample PGO was intentional. See discussions in https://reviews.llvm.org/D98213

  1. For sample PGO, most of the important profile guided inlining should happen in sample profile loader, so if we want to enable this heuristic, we should do it for sample loader.
  2. We may need to take MFS into account to have accurate size. This is done for IRPGO, but not sample PGO.

So the conclusion was additional work would be required before we can enable it for sample PGO. But I'm not aware of any progress on these work, so the reasons that led to not enable it by default for sample PGO probably still holds.

It's long time ago. I think its about ~1% gain.

Changing default has a high bar. Usually we need performance testing with multiple high confidence positive data before doing it. It doesn't sound like that's the case here.

@HaohaiWen
Copy link
Contributor Author

HaohaiWen commented Mar 8, 2024

If I recall correctly, SampleProfileLoader only try to inline functions which were inlined before SPGO.
e.g. Before SPGO,

B, C was inlined by A. 
D was not inlined by A but D is very small and hot.

SampleProfileLoader will only try to inline B, C to A based on profile data. It will never try to inline D to A. Otherwise we don't need Inliner Pass when enable Sample Profile Loader.

As long as prof data of SPGO has similar quality as IPGO, I think it's safe to enable it by default.
I think MFS (machine function splitter) is disabled by default. Is your regression due to split non-cold BB from function by MFS?
Have you observed overall regression for your workloads? Is regression possibly due to inaccurate profile data?

@WenleiHe
Copy link
Member

With CSSPGO, SampleProfileLoader can be more than a reply inliner.

Otherwise we don't need Inliner Pass when enable Sample Profile Loader.

Not really. Sample loader inlining has its limitation as it's early in the pipeline so a lot of simplification hasn't happened yet, for that reason sample loader inlining may not be complete. A later inliner (CGSCC or ModuleInliner) helps clean things up more.

As long as prof data of SPGO has similar quality as IPGO, I think it's safe to enable it by default.

This needs to be backed by data. Also it is known that sample PGO has inferior profile quality than IRPGO, so the assumption isn't true.

Have you observed overall regression for your workloads? Is regression possibly due to inaccurate profile data?

Yes, the regression is not specific to one service. See comments above from @helloguo

There is also concern about code size increase as mentioned by another comment. We'll revert this patch. In general, changing default like this needs to be backed by more data and vetted by stakeholders.

helloguo added a commit to helloguo/llvm-project that referenced this pull request Mar 26, 2024
wlei-llvm pushed a commit that referenced this pull request Mar 28, 2024
#66457 makes InlineCost to use cost-benefit by default, which causes
0.4-0.5% performance regression on multiple internal workloads. See
discussions #66457. This pull
request reverts it.

Co-authored-by: helloguo <helloguo@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants