Integrate measureme's hardware performance counter support. #78781

eddyb · 2020-11-05T17:45:01Z

Note: this is a companion to rust-lang/measureme#143, and duplicates some information with it for convenience

(much later) EDIT: take any numbers with a grain of salt, they may have changed since initial PR open.

Credits

I'd like to start by thanking @alyssais, @cuviper, @edef1c, @glandium, @jix, @Mark-Simulacrum, @m-ou-se, @mystor, @nagisa, @puckipedia, and @yorickvP, for all of their help with testing, and valuable insight and suggestions.
Getting here wouldn't have been possible without you!

(If I've forgotten anyone please let me know, I'm going off memory here, plus some discussion logs)

Summary

This PR adds support to -Z self-profile for counting hardware events such as "instructions retired" (as opposed to being limited to time measurements), using the rdpmc instruction on x86_64 Linux.

While other OSes may eventually be supported, preliminary research suggests some kind of kernel extension/driver is required to enable this, whereas on Linux any user can profile (at least) their own threads.

Supporting Linux on architectures other than x86_64 should be much easier (provided the hardware supports such performance counters), and was mostly not done due to a lack of readily available test hardware.
That said, 32-bit x86 (aka i686) would be almost trivial to add and test once we land the initial x86_64 version (as all the CPU detection code can be reused).

A new flag -Z self-profile-counter was added, to control which of the named measureme counters is used, and which defaults to wall-time, in order to keep -Z self-profile's current functionality unchanged (at least for now).

The named counters so far are:

wall-time: the existing time measurement
- name chosen for consistency with perf.rust-lang.org
- continues to use std::time::Instant for a nanosecond-precision "monotonic clock"
instructions:u: the hardware performance counter usually referred to as "Instructions retired"
- here "retired" (roughly) means "fully executed"
- the :u suffix is from the Linux perf tool and indicates the counter only runs while userspace code is executing, and therefore counts no kernel instructions
  - see Caveats/Subtracting IRQs for why this isn't entirely true and why instructions-minus-irqs:u should be preferred instead
instructions-minus-irqs:u: same as instructions:u, except the count of hardware interrupts ("IRQs" here for brevity) is subtracted
- see Caveats/Subtracting IRQs for why this should be preferred over instructions:u
instructions-minus-r0420:u: experimental counter, same as instructions-minus-irqs:u but subtracting an undocumented counter (r0420:u) instead of IRQs
- the rXXXX notation is again from Linux perf, and indicates a "raw" counter, with a hex representation of the low-level counter configuration - this was picked because we still don't really know what it is
- this only exists for (future) testing and isn't included/used in any comparisons/data we've put together so far
- see Challenges/Zen's undocumented 420 counter for details on how this counter was found and what it does

There are also some additional commits:

see Challenges/Rebasing shouldn't affect the results, right? for details on the changes to rustc_parse and rustc_trait_section (the latter far more dubious, and probably shouldn't be merged, or not as-is)
- EDIT: the effects of these are no long quantifiable, the PR includes reverts for them
~~see Challenges/jemalloc: purging will commence in ten seconds for details on the jemalloc change~~
- this is also separately found in [experiment/perf] Disable jemalloc's time-delayed purging, for extra determinism. #77162, and we probably want to avoid doing it by default, ideally we'd use the runtime control API jemalloc offers (assuming that can stop the timer that's already running, which I'm not sure about)
- EDIT: until we can do this based on -Z flags, this commit has also been reverted
the proc_macro change was to avoid randomized hashing and therefore ASLR-like effects

(much later) EDIT: take any numbers with a grain of salt, they may have changed since initial PR open.

Write-up / report

Because of how extensive the full report ended up being, I've kept most of it on hackmd.io, but for convenient access, here are all the sections (with individual links):
^{(someone suggested I'd make a backup, so here it is on the wayback machine - I'll need to remember to update that if I have to edit the write-up)}

Motivation

Results

Overhead
Preview (see the report itself for more details):

Counter	Total `instructions-minus-irqs:u`	Overhead from "Baseline" (for all 1903881 counter reads)	Overhead from "Baseline" (per each counter read)
Baseline	63637621286 ±6
`instructions:u`	63658815885 ±2	+21194599 ±8	+11
`instructions-minus-irqs:u`	63680307361 ±13	+42686075 ±19	+22
`wall-time`	63951958376 ±10275	+314337090 ±10281	+165

"Macro" noise (self time)
Preview (see the report itself for more details):

	`wall-time` (ns)	`instructions:u`	`instructions-minus-irqs:u`
`typeck`	5478261360 ±283933373 (±~5.2%)	17350144522 ±6392 (±~0.00004%)	17351035832.5 ±4.5 (±~0.00000003%)
`expand_crate`	2342096719 ±110465856 (±~4.7%)	8263777916 ±2937 (±~0.00004%)	8263708389 ±0 (±~0%)
`mir_borrowck`	2216149671 ±119458444 (±~5.4%)	8340920100 ±2794 (±~0.00003%)	8341613983.5 ±2.5 (±~0.00000003%)
`mir_built`	1269059734 ±91514604 (±~7.2%)	4454959122 ±1618 (±~0.00004%)	4455303811 ±1 (±~0.00000002%)
`resolve_crate`	942154987.5 ±53068423.5 (±~5.6%)	3951197709 ±39 (±~0.000001%)	3951196865 ±0 (±~0%)

"Micro" noise (individual sampling intervals)

Caveats
Challenges

rust-highfive · 2020-11-05T17:45:07Z

r? @davidtwco

(rust_highfive has picked a reviewer for you, use r? to override)

eddyb · 2020-11-05T17:46:50Z

@bors try @rust-timer queue (this won't use the new counters, it's just to check the overhead of the new hacks)

rust-timer · 2020-11-05T17:46:51Z

Awaiting bors try build completion

bors · 2020-11-05T17:47:15Z

⌛ Trying commit 5c0e03110a79a66356646ec3c02b3c0a4b6fb77a with merge 36688b7cf765fadab38d1c07a0e7c3111cac632d...

eddyb · 2020-11-05T17:56:56Z

cc @Amanieu Looks like we generate LLVM inline assembly that's not compatible with older versions?
i.e. LLVM 8 on CI, specifically this run: https://github.com/rust-lang/rust/pull/78781/checks?check_run_id=1359732667

LLVM ERROR: Bad $ operand number in inline asm string: 'xor eax, eax
cpuid
mov ecx, ${4:k}
rdpmc'
error: could not compile `measureme`

bjorn3 · 2020-11-05T18:02:43Z

Intel syntax is not supported by older versions of LLVM. Only AT&T syntax is.

bors · 2020-11-05T18:30:00Z

☀️ Try build successful - checks-actions
Build commit: 36688b7cf765fadab38d1c07a0e7c3111cac632d (36688b7cf765fadab38d1c07a0e7c3111cac632d)

rust-timer · 2020-11-05T18:30:02Z

Queued 36688b7cf765fadab38d1c07a0e7c3111cac632d with parent b1d9f31, future comparison URL.

rust-timer · 2020-11-05T20:36:24Z

Finished benchmarking try commit (36688b7cf765fadab38d1c07a0e7c3111cac632d): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-perf

bjorn3 · 2020-11-05T20:44:52Z

Regressions of up to 3%.

eddyb · 2020-11-06T03:15:55Z

@Mark-Simulacrum Just checking, do we run the -Z self-profile for query details under perf stat, and if so, do we mix that into non--Z self-profile results? Because -Z self-profile will have more instructions now, as we have to match on the Counter.

Mark-Simulacrum · 2020-11-06T19:43:21Z

Yes, but we take the minimum of all runs under perf stat (including the self profile one), and always have at least 2 (and IIRC at most 3 right now) runs, including one self profile run.

Amanieu · 2020-11-06T22:07:21Z

Intel syntax is not supported by older versions of LLVM. Only AT&T syntax is.

You can use options(att_syntax) and then use AT&T syntax for the assembly code and it will work on all LLVM versions.

davidtwco

Implementation LGTM, r=me when ready

eddyb · 2020-11-20T12:54:32Z

compiler/rustc_trait_selection/src/traits/select/mod.rs

+        // HACK(eddyb) remove instruction-counting noise from `-Z self-profile`.
+        #[cfg(target_arch = "x86_64")]
+        unsafe {
+            asm!("mfence", options(nostack));
+        }


@davidtwco @nikomatsakis Just it doesn't get lost, I don't think we should merge with this, but I also am not sure what to do here. Do you think it would be reasonable to limit this to when -Z self-profile is enabled?

It's also not easy to test the impact of this fence.

I'm not too sure either. I personally wouldn't have a problem with this being limited to when -Z self-profile is enabled.

Checking for -Zself-profile is probably more expensive than unconditionally running mfence.

I'm fine leaving it in as it is for now. We can analyze its impact once perftest multithreaded rustc

eddyb · 2020-11-20T13:09:36Z

Given #78785, it might make sense to define getrandom in librustc_driver and have new flag for replacing getrandom with a deterministic PRNG (e.g. -Z getrandom-seed?) - note that this would not only affect proc macros (e.g. serde) using their own HashMaps, but also uses of randomness in the const_random proc macro, or for things like temp dir names in rustc itself.

eddyb · 2020-11-23T01:29:32Z

Intel syntax is not supported by older versions of LLVM. Only AT&T syntax is.

I was confused about this, because I could've sworn I've used Intel syntax inline asm! many years ago.
LLVM documents inteldialect since 3.2: https://releases.llvm.org/3.2/docs/LangRef.html#inlineasm

The code emitting the error we're seeing is still around, but the ${ handling looks like it might be new.

And, there it is: operand modifiers for Intel syntax is new in LLVM 10 (llvm/llvm-project@dc5b614)

So the problem is the {...:e} modifiers in order to get 32-bit registers instead of the 64-bit ones.
Oh well, I guess I am going with AT&T syntax after all (after checking that the generated assembly is identical).
(EDIT: done, see rust-lang/measureme#147)

eddyb · 2020-11-23T05:59:32Z

Looks like the CI build got much farther this time (before getting cancelled by the check failure), so that worked.
(EDIT: I went ahead and allowed the temporary git crate source in tidy, so CI can complete)

bors · 2020-12-03T14:50:51Z

☔ The latest upstream changes (presumably #79586) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

crlf0710 · 2021-01-15T13:01:53Z

Triage: there's merge conflicts.

eddyb · 2022-06-13T18:31:46Z

Found likely culprit, missing #[inline] on these:

rust/library/std/src/collections/hash/map.rs

Lines 3136 to 3159 in 083721a

    
           impl DefaultHasher { 
        
               /// Creates a new `DefaultHasher`. 
        
               /// 
        
               /// This hasher is not guaranteed to be the same as all other 
        
               /// `DefaultHasher` instances, but is the same as all other `DefaultHasher` 
        
               /// instances created through `new` or `default`. 
        
               #[stable(feature = "hashmap_default_hasher", since = "1.13.0")] 
        
               #[allow(deprecated)] 
        
               #[must_use] 
        
               pub fn new() -> DefaultHasher { 
        
                   DefaultHasher(SipHasher13::new_with_keys(0, 0)) 
        
               } 
        
           } 
        
           #[stable(feature = "hashmap_default_hasher", since = "1.13.0")] 
        
           impl Default for DefaultHasher { 
        
               /// Creates a new `DefaultHasher` using [`new`]. 
        
               /// See its documentation for more. 
        
               /// 
        
               /// [`new`]: DefaultHasher::new 
        
               fn default() -> DefaultHasher { 
        
                   DefaultHasher::new() 
        
               } 
        
           }

Don't think anyone uses them so I won't bother with a separate benchmarking PR, just add a fix to this PR directly.

eddyb · 2022-06-13T18:33:47Z

@bors try @rust-timer queue

rust-timer · 2022-06-13T18:33:49Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-06-13T18:33:55Z

⌛ Trying commit a4f1331 with merge 37e95f0d4e5b87d4b88f8eb37c46b964462717c9...

bors · 2022-06-13T20:24:19Z

☀️ Try build successful - checks-actions
Build commit: 37e95f0d4e5b87d4b88f8eb37c46b964462717c9 (37e95f0d4e5b87d4b88f8eb37c46b964462717c9)

rust-timer · 2022-06-13T20:24:21Z

Queued 37e95f0d4e5b87d4b88f8eb37c46b964462717c9 with parent 083721a, future comparison URL.

rust-timer · 2022-06-13T21:41:11Z

Finished benchmarking commit (37e95f0d4e5b87d4b88f8eb37c46b964462717c9): comparison url.

Instruction count

Primary benchmarks: no relevant changes found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	0.3%	0.4%	2
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	N/A	N/A	0

Max RSS (memory usage)

Results

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 😿 relevant regression found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	2.9%	2.9%	1
Improvements 🎉 (primary)	-1.4%	-2.7%	2
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	-1.4%	-2.7%	2

Cycles

Results

Primary benchmarks: 🎉 relevant improvement found
Secondary benchmarks: no relevant changes found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-2.3%	-2.3%	1
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	-2.3%	-2.3%	1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

eddyb · 2022-06-13T21:54:52Z

@oli-obk Should be ready to merge now, the remaining tt-muncher thing feels like noise or getting unlucky with the choice of deterministic hash key.

oli-obk · 2022-06-14T09:40:04Z

@bors r+

bors · 2022-06-14T09:40:06Z

📌 Commit a4f1331 has been approved by oli-obk

bors · 2022-06-14T13:37:48Z

⌛ Testing commit a4f1331 with merge 872503d...

bors · 2022-06-14T16:18:24Z

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 872503d to master...

bjorn3 · 2022-06-14T16:33:44Z

compiler/rustc_data_structures/src/profiling.rs

-        let filename = format!("{}-{}.rustc_profile", crate_name, process::id());
+        // HACK(eddyb) we need to pad the PID, strange as it may seem, as its
+        // length can behave as a source of entropy for heap addresses, when
+        // ASLR is disabled and the heap is otherwise determinic.


*deterministic

rust-timer · 2022-06-14T18:48:57Z

Finished benchmarking commit (872503d): comparison url.

Instruction count

Primary benchmarks: no relevant changes found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	0.8%	1.2%	3
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	N/A	N/A	0

Max RSS (memory usage)

Results

Primary benchmarks: 🎉 relevant improvement found
Secondary benchmarks: no relevant changes found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-0.1%	-0.1%	1
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	-0.1%	-0.1%	1

Cycles

Results

Primary benchmarks: 😿 relevant regression found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	2.4%	2.4%	1
Regressions 😿 (secondary)	4.2%	5.0%	3
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	2.4%	2.4%	1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

eddyb · 2022-06-24T20:55:22Z

Oops this is what I get for not testing it, only realized it now that this is actually broken:

$ echo 'fn main() {}' | rustc +nightly - -Z self-profile -Z self-profile-counter=instructions-minus-irqs:u
warning: failed to create profiler: only supported with measureme's "nightly" feature

I removed features = ["nightly"] because I thought that got removed from measureme, but I guess that's in a newer version than what rustc currently uses.

eddyb · 2022-12-27T08:25:38Z

$ echo 'fn main() {}' | rustc +nightly-2022-07-23 - -Z self-profile -Z self-profile-counter=instructions-minus-irqs:u
warning: failed to create profiler: only supported with measureme's "nightly" feature

$ echo 'fn main() {}' | rustc +nightly-2022-07-24 - -Z self-profile -Z self-profile-counter=instructions-minus-irqs:u
thread 'opt hkqbnt908e7l8ng' panicked at 'assertion failed: end <= MAX_INTERVAL_VALUE', /cargo/registry/src/github.com-1ecc6299db9ec823/measureme-10.1.0/src/raw_event.rs:56:9

@jyn514 mentioned it never worked, to me, and... yeah, we added features = ["nightly"] but ~~measureme 10.* is just broken, unlike measureme 9.*?~~ (or did I never test multi-thread?)

So this is the cargo check equivalent (which is most of how I tested the counters):

$ echo 'fn main() {}' | rustc +nightly-2022-07-24 - -Z self-profile -Z self-profile-counter=instructions-minus-irqs:u --emit=metadata
$ summarize summarize unknown-crate-*.mm_profdata
Segmentation fault (core dumped)

That's a segfault in prettytable::TableSlice::get_all_column_width?? summarize summarize --json works tho.

rust-highfive assigned davidtwco Nov 5, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 5, 2020

eddyb mentioned this pull request Nov 5, 2020

Hardware performance counter support (via rdpmc). rust-lang/measureme#143

Merged

cuviper mentioned this pull request Nov 5, 2020

linux: try to use libc getrandom to allow interposition #78785

Merged

davidtwco approved these changes Nov 8, 2020

View reviewed changes

eddyb commented Nov 20, 2020

View reviewed changes

eddyb force-pushed the measureme-rdpmc branch from 5c0e031 to 5b4363f Compare November 23, 2020 01:41

eddyb force-pushed the measureme-rdpmc branch from 5b4363f to 97b4a30 Compare November 23, 2020 06:13

davidtwco added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 21, 2020

crlf0710 removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jan 15, 2021

[perf] std: add missing #[inline] to DefaultHasher::{new,default}.

a4f1331

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 13, 2022

rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Jun 13, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 14, 2022

bors mentioned this pull request Jun 14, 2022

diagnostics: remove trailing spaces #97892

Merged

bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 14, 2022

bors merged commit 872503d into rust-lang:master Jun 14, 2022

rustbot added this to the 1.63.0 milestone Jun 14, 2022

bors mentioned this pull request Jun 14, 2022

reduce RPC overhead for common proc_macro operations #86822

Closed

bjorn3 reviewed Jun 14, 2022

View reviewed changes

eddyb deleted the measureme-rdpmc branch June 14, 2022 17:58

wesleywiser mentioned this pull request Jun 15, 2022

Use hardware performance counter data for the detailed/self-profile data view rust-lang/rustc-perf#1345

Open

Integrate measureme's hardware performance counter support. #78781

Integrate measureme's hardware performance counter support. #78781

Uh oh!

Conversation

eddyb commented Nov 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Credits

Summary

Write-up / report

Uh oh!

rust-highfive commented Nov 5, 2020

Uh oh!

eddyb commented Nov 5, 2020

Uh oh!

rust-timer commented Nov 5, 2020

Uh oh!

bors commented Nov 5, 2020

Uh oh!

eddyb commented Nov 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjorn3 commented Nov 5, 2020

Uh oh!

bors commented Nov 5, 2020

Uh oh!

rust-timer commented Nov 5, 2020

Uh oh!

rust-timer commented Nov 5, 2020

Uh oh!

bjorn3 commented Nov 5, 2020

Uh oh!

eddyb commented Nov 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mark-Simulacrum commented Nov 6, 2020

Uh oh!

Amanieu commented Nov 6, 2020

Uh oh!

davidtwco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddyb commented Nov 20, 2020

Uh oh!

eddyb commented Nov 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eddyb commented Nov 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bors commented Dec 3, 2020

Uh oh!

crlf0710 commented Jan 15, 2021

Uh oh!

eddyb commented Jun 13, 2022

Uh oh!

eddyb commented Jun 13, 2022

Uh oh!

rust-timer commented Jun 13, 2022

Uh oh!

bors commented Jun 13, 2022

Uh oh!

bors commented Jun 13, 2022

Uh oh!

rust-timer commented Jun 13, 2022

Uh oh!

rust-timer commented Jun 13, 2022

Footnotes

Uh oh!

eddyb commented Jun 13, 2022

Uh oh!

oli-obk commented Jun 14, 2022

Uh oh!

eddyb commented Nov 5, 2020 •

edited

Loading

eddyb commented Nov 5, 2020 •

edited

Loading

eddyb commented Nov 6, 2020 •

edited

Loading

eddyb commented Nov 23, 2020 •

edited

Loading

eddyb commented Nov 23, 2020 •

edited

Loading