Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable trace level tracing by default via a Cargo feature #7223

Closed
wants to merge 1 commit into from

Conversation

danchia
Copy link
Contributor

@danchia danchia commented Jan 16, 2023

Objective

  • Reduce significant overhead that was observed with trace-level logging in rendering, even when log level was INFO. This was observed especially in TrackedRenderPass which is called repeatedly in the tight inner loops of the renderer.

Using set_bind_group as an example, out of ~200ms spent on the function, ~140ms we spent checking if tracing is enabled at the trace level.

image

Solution

  • Introduce a new default bevy cargo feature release_max_level_debug that statically disables trace-level tracing, thereby eliminating trace-related overhead.

Tracy capture using

cargo build --release --features bevy/trace_tracy  --example many_cubes

this = baseline
external = this PR

show a reduction in the median time for main_opaque_pass_3d rendering from 9.06ms -> 4.44ms, i.e. ~2X speedup.

Screenshot 2023-01-15 235750


Changelog

  • By default, in release mode trace-level tracing is statically disabled.

Migration Guide

  • In release builds, Bevy now statically disables trace-level tracing to improve performance. This means trace! logs will no longer appear regardless of the specified log level. To re-enable them in release builds, exclude the default features of bevy and include them manually, leaving out release_max_level_debug.

"hdr",
"png",
"release_max_level_debug",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is the real change, the rest was just alpha-sorting the list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that not just called trace or tracing and left out of default features? Negative features are strongly discouraged by cargo itself; all features should be additive.

@james7132 james7132 added A-Diagnostics Logging, crash handling, error reporting and performance analysis A-Rendering Drawing game state to the screen X-Controversial There is active debate or serious implications around merging this PR C-Performance A change motivated by improving speed, memory usage or compile times labels Jan 16, 2023
Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge performance improvement. Really nice job finding this!

However, this approach does have issues where we generally advise users to compile their dependencies in release mode to improve development time performance. Users will have a harder time debugging rendering operations involved without disabling the default features.

Marking this as controversial since this impacts the entire engine.

@danchia
Copy link
Contributor Author

danchia commented Jan 16, 2023

Agreed it's a pretty big change. I'm also being open to being more tactical - there's probably < 10 callsites that should slap a #[cfg(debug_assertion)] on

@robtfm
Copy link
Contributor

robtfm commented Jan 16, 2023

is this a reversal of #4580?

if so, we should still add the feature to the stress test examples (at least) using the strategy outlined in that pr

Maximum log levels for release builds are not enforced by Bevy anymore, to omit "debug" and "trace" level logs entirely from release builds, tracing must be added as a dependency with its release_max_level_info feature enabled in Cargo.toml. (tracing = { version = "0.1", features = ["release_max_level_info"] }).

@danchia
Copy link
Contributor Author

danchia commented Jan 16, 2023

This is a reversal of #4580 in spirit - it re-enables compile-time suppression of trace! (but not debug!) logs.
However, the approach is different that what was there prior to #4580.

Rather than unconditionally setting release_max_level_info, we set release_max_level_debug only if the Bevy feature release_max_level_info is enabled.

By default, release_max_level_debug is enabled - it's a Bevy default feature. I feel somewhat strongly about this - I think most users will not look at examples too carefully, and leave trace logging on. And anywhere in our codebase we happen to leave a trace! statement in a very hot path we'll pay a silent performance penalty.
This does mean that a user who really wants trace level logging in release will need to need to import bevy without the default features. If people feel strongly against this, I would rather selectively disable the trace logging in the hot paths.

@superdump
Copy link
Contributor

I think I agree that having release_max_level_debug as a default feature is a good idea. Needing tracing higher than debug should really be the exception and if someone needs to do that kind of debugging they will find out how to enable trace logs. However, if we don't have it as default, I expect a lot of people would not know to enable it and so would, it seems, leave a lot of performance on the table.

@superdump
Copy link
Contributor

@james7132 wouldn't the feature only affect our usage of tracing and not also dependencies?

@robtfm
Copy link
Contributor

robtfm commented Jan 16, 2023

wouldn't the feature only affect our usage of tracing and not also dependencies?

features are additive so i'm pretty sure it would apply to the whole app (unless users/deps use a different version of tracing or something). it definitely used to suppress user logging before #4580. i personally think being welcoming to newcomers (i.e. not silently supressing their logs by default) is more important than squeezing out every last fps by default (especially since squeezing fps requires other actions like setting lto anyway).

@tim-blackbird
Copy link
Contributor

However, this approach does have issues where we generally advise users to compile their dependencies in release mode to improve development time performance.

Only the opt-level. We don't recommend disabling debug-assertions so release_max_level_debug won't take effect in a dev build.

@james7132
Copy link
Member

james7132 commented Jan 17, 2023

Only the opt-level. We don't recommend disabling debug-assertions so release_max_level_debug won't take effect in a dev build.

@danchia if that's the case, can we just put the trace! macros behind a #[cfg(debug_assertions)] instead of using the tracing feature? It's the least intrusive way to enable these performance gains.

@mockersf
Copy link
Member

can we just put the trace! macros behind a #[cfg(debug_assertions)]

That would mean it would not be possible to have spans on a release build.

The perf impact of those was known (and maybe forgotten), that's one of the reason I have a feature for release in my games that enable tracing/release_max_level_warn (see https://github.com/mockersf/spaceviata/blob/main/Cargo.toml#L44)

I think I would be in favor of how this PR works, a default feature release_max_level_debug that can be disabled, and documenting it. Advanced users are expected to disable default feature anyway, so they should know to re-enable level control through tracing features

@james7132
Copy link
Member

Did some digging with the stress-test profile. The perf gains aren't as pronounced, but are notable. Here is a comparison on many_foxes's main_opaque_pass_3d on main (red) and this PR (yellow). It shows about a 20% improvement (341.64us -> 273.25us)

image

@danchia
Copy link
Contributor Author

danchia commented Jan 17, 2023

Quick clarifications

  • tracing is using #[cfg(debug_assertions)] to check for release-ness.
  • whatever we do will affect users' release builds as well (any build that turns off debug_assertions). Bevy re-exports tracing macros as bevy's log macros, and as @robtfm points out cargo features are additive, so setting release_max_level_debug would apply to their build.

Options:

  1. Do nothing - everybody needs to remember to turn off trace level logging statically when performance matters. We should do this somehow (how??) for our stress-profile.
  2. Tactical fix - add #[cfg(debug_assertions)] to the handful of call sites I know are very hot. Pros: no change to visible to users except for the handful of sites we disable trace logging for. Cons: We may accidentally introduce more trace! usages in hot paths in future.
  3. Performance by default in release builds (this PR)

Personally, I think both (3) and (2) are acceptable, but being new to Bevy I don't know how to weigh the options. I would be happy to raise a new PR for (2) if that's what is deemed preferable.

@danchia
Copy link
Contributor Author

danchia commented Jan 17, 2023

I just discovered that wgpu also suffers from a similar issue when recording render pass comands, due to log::trace! usage
image

The log crate is more prescriptive (https://docs.rs/log/latest/log/#compile-time-filters)

Libraries should avoid using the max level features because they’re global and can’t be changed once they’re set.

I do think that our usage of a feature makes it a little more palatable - users have a way to opt out, in contrast to what we did pre #4580.

That said, given that multiple crates / libraries suffer from this, option (2) is looking a bit more tempting, together with performance guidance for end users around LTO, and release_max_level_debug for tracing and log.

@superdump
Copy link
Contributor

My opinion, I believe, is that many people are more likely to miss this feature and so leave performance on the table unnecessarily than are to miss logs/traces at a higher level than debug. I think it is a good thing that the performance hit is also removed from dependencies such as wgpu. If people need trace-level logging they can find out how to make it work or run debug because that level of verbosity is going to anyway slow down execution.

@cart what do you think about this PR?

@killercup
Copy link
Contributor

It would be very surprising to me to see that bevy disables emitting trace logs from my code -- which I think this would do.

There is however one option I have not seen mentioned in this thread:

  • Add a new feature to bevy, e.g. called additional-bevy-trace-logs, and put these expensive calls behind that.

This way it's an opt-in feature that people who are debugging something close to this can find and enable, while also not overloading any existing mechanism.

@cart
Copy link
Member

cart commented Jan 23, 2023

Sounds like we need to balance the following:

  1. Released games should be as fast as possible
  2. Users writing trace logs expect them to work by default
  3. We produce some trace logs that violate (1). We will likely continue to do so and the perf impact will only increase over time.
  4. Our dependencies (like wgpu) also produce trace logs that violate (1). They will likely continue to do so and the perf impact will likely increase over time as we add more dependencies and our dependencies grow in feature set.
  5. 3rd party Bevy Plugins will also introduce new dependencies that might also have trace logging that tanks perf. We could consider this to be "their problem", but this isn't something most people in the ecosystem will be thinking about.
  6. We should expect most users to just use cargo build --release to release their games. Anything beyond that will probably be a "small niche of advanced bevy users". Therefore the default bevy features in release mode should produce our desired outcomes.

I consider the combination of (1) and (6) to be the dominating priority here. Users must have some way to view their trace logs. If it works by default, awesome. But bad performance in default release builds affects Bevy app developers' experience, the experience of their users, and the overall perception of Bevy's quality as an engine.

If (3) was our only problem, I think using one or more bevy-specific non-default tracing features would be the move. As long as we stay principled, we get (1) and (2) with no downside.

Unfortunately (4) is also a problem. It is almost entirely out of our control, and I think it is likely to get worse over time.

It would be interesting to see how much perf we'd currently leave on the table if we leave trace logs enabled/compiled in by default (ex: wgpu still does trace logs), but we disable our own trace logs with a feature flag. But even if that loss is acceptable now, it feels scary that one of our dependencies might one day do a patch release that adds a new trace log that immediately tanks everyone's performance by default. We have no levers to pull other than disabling trace logging unconditionally.

I'm currently biased toward the opinionated "trace-level logging is disabled by default in bevy" approach. It protects us, our users, and our users' users from this problem unconditionally now and forever. People that want tracing (a comparatively small / advanced group of users) can disable default features. However going the "non-default" route is currently non-trivial, as it means users need to learn exactly what features to enable ... we should investigate ways to make expressing [default_features - disable-trace-logging] easy.

However if we can prove that we currently aren't leaving too much perf on the table with the "put bevy trace logs behind a non-default feature flag" approach (ex: wgpu isn't too expensive), then I could be convinced to table the problem until it becomes a bigger issue. But it feels like this will be a problem eventually. The number of traces in our dependencies is unbounded. Ditto for 3rd party Bevy plugins.

@danchia
Copy link
Contributor Author

danchia commented Jan 23, 2023

@cart this PR doesn't include it – was waiting on direction to stabilize – but this writeup I did covers the impact of adding log/release_max_level_debug in improving wgpu performance: https://danchia.github.io/posts/bevy-perf-deep-dive-2/

Spoiler: 2X speedup in TrackedRenderPass. caveats - this was under the sampling profiler, I probably should check again in Tracy. But definitely non-trivial. Happy to pull more numbers.

@mockersf
Copy link
Member

mockersf commented Jan 23, 2023

  1. We should expect most users to just use cargo build --release to release their games. Anything beyond that will probably be a "small niche of advanced bevy users". Therefore the default bevy features in release mode should produce our desired outcomes.

I disagree on this one. We should educate Bevy users in how to disable default features and only add the one they want. For performances and build time. I hope the editor will be very helpful on that point...

@cart
Copy link
Member

cart commented Jan 23, 2023

I disagree on this one. We should educate Bevy users in how to disable default features and only add the one they want. For performances and build time. I hope the editor will be very helpful on that point...

Yeah "Bevy Editor as a visual feature manager, maybe facilitated by bevy_cli" does seem reasonable to me and would make this all more accessible. And I agree that enabling / disabling features provides tangible benefits. But I still think as it stands with normal cargo builds saying this to the average user (especially new ones) is untenable: "you should disable default features, then learn what all 18 default features do, pick the ones you want, and don't forget that new releases will add new default features, so if you try to follow a tutorial and you can't find a symbol it might not be a typo: make sure we didn't add a new feature or you didn't forget to include an existing one".

The cargo build --release defaults should work as expected and have the behaviors we want deployed games to have. It is the "out of the box Bevy experience" and we should optimize for it. Moderate-to-advanced users should think about messing with the defaults.

@djeedai
Copy link
Contributor

djeedai commented Feb 5, 2023

I disagree on the whole "tracing should be enabled by default, users except trace! to work out of the box". Anyone using trace! will use a profiler and understands the basics of profiling (use a release build, etc.). Plus, profiling is one of the few / the only documented process in the repo, so from experience whoever wants to profile Bevy will look at those docs to understand how to generate traces etc. So having trace! enabled by default for the 95% of users who will never use those is not a good option in opinion.

Now, independently, we could also look at why those checks are expensive. 140ms of checking a trace level is unexpected and prohibitive. The check should be branch-predicted fairly correctly, so should have almost zero impact. This looks like an issue not only for regular builds, but even for profiling ones if 2/3rd of the time is spent in profiling-related code, because then we're not measuring the app we're profiling the profiling code itself. A rule of thumb is that profiling overhead should be in the 5-10% range at most, and ideally even around 1-2% only.

@superdump
Copy link
Contributor

Revisiting this as it was raised on Discord. I hadn’t seen @cart ’s message but I was going to come here and say ‘how about adding a default feature that disables trace level logs/spans and in that way people can configure it if they want/need to.’

@djeedai it’s not just about profiling as far as I understand. The trace log level is for super-verbose debug logs that have nothing to do with profiling. Your point about the overhead of instrumentation for the purpose of profiling is very relevant. In practice we only add spans around systems/command application/overhead at the most granular at the moment. However, the trace logs that we added in the TrackedRenderPass are per binding/draw command which are maybe 6 or so instances per visible mesh entity for 3D.

@cart
Copy link
Member

cart commented Feb 5, 2023

I've given this a bit more thought. I think we've largely come to the conclusion that disabling tracing by default would be reasonable behavior for bevy on its own.

However ... I personally think asking people to type out cargo run --no-default-features --features LIST_EVERY_OTHER_CURRENT_DEFAULT_BEVY_FEATURE (currently: cargo run --no-default-features --features animation bevy_asset bevy_audio bevy_gilrs bevy_scene bevy_winit render png hdr vorbis x11 filesystem_watcher) to implicitly enable trace logging is really really bad UX. It seems unlikely that cargo will be fixing this UX problem any time soon. We can hope, but it isn't something to bet on. We could handle this with our own tooling (ex: bevy_cli build --trace-logging) but imo we should be optimizing for the "standard" rust build experience. We will likely have a bevy_cli eventually, but ideally it would be supplemental, not something we encourage people to actually build Bevy with by default.

In the immediate short term, given that wgpu's tracing is the biggest offender, I think we should reach out to the wgpu folks and ask them if they'd be willing to put these logs behind a feature flag (non-default would be preferable I think, but thats not critical). This issue affects their users too ... I bet a good portion of their users are deploying slower-than-necessary release builds as a result. That fixes the immediate (very nasty) problem and buys us time to solve this problem more globally (if we decide that is still necessary). @cwfitzgerald: any thoughts on this?

@cwfitzgerald
Copy link

Definitely file an issue about it - we've already been in discussion about adjusting our logging.

I've never seen it in profiles, but if it's really 2x range, we definitely want to consider it.

@danchia
Copy link
Contributor Author

danchia commented Feb 12, 2023

Closing in favor of the approach in #7639. I will reach out to wgpu about their logging.

@danchia danchia closed this Feb 12, 2023
bors bot pushed a commit that referenced this pull request Feb 13, 2023
Profiles show that in extremely hot loops, like the draw loops in the renderer, invoking the trace! macro has noticeable overhead, even if the trace log level is not enabled.

Solve this by introduce a 'wrapper' detailed_trace macro around trace, that wraps the trace! log statement in a trivially false if statement unless a cargo feature is enabled

# Objective

- Eliminate significant overhead observed with trace-level logging in render hot loops, even when trace log level is not enabled.
- This is an alternative solution to the one proposed in #7223 

## Solution

- Introduce a wrapper around the `trace!` macro called `detailed_trace!`. This macro wraps the `trace!` macro with an if statement that is conditional on a new cargo feature, `detailed_trace`. When the feature is not enabled (the default), then the if statement is trivially false and should be optimized away at compile time.
- Convert the observed hot occurrences of trace logging in `TrackedRenderPass` with this new macro.

Testing the results of 

```
cargo run --profile stress-test --features bevy/trace_tracy --example many_cubes -- spheres
```

![image](https://user-images.githubusercontent.com/1222141/218298552-38551717-b062-4c64-afdc-a60267ac984d.png)

shows significant improvement of the `main_opaque_pass_3d`  of the renderer, a median time decrease from 6.0ms to 3.5ms. 

---

## Changelog

- For performance reasons, some detailed renderer trace logs now require the use of cargo feature `detailed_trace` in addition to setting the log level to `TRACE` in order to be shown.

## Migration Guide

- Some detailed bevy trace events now require the use of the cargo feature `detailed_trace` in addition to enabling `TRACE` level logging to view. Should you wish to see these logs, please compile your code with the bevy feature `detailed_trace`. Currently, the only logs that are affected are the renderer logs pertaining to `TrackedRenderPass` functions
myreprise1 pushed a commit to myreprise1/bevy that referenced this pull request Feb 14, 2023
…7639)

Profiles show that in extremely hot loops, like the draw loops in the renderer, invoking the trace! macro has noticeable overhead, even if the trace log level is not enabled.

Solve this by introduce a 'wrapper' detailed_trace macro around trace, that wraps the trace! log statement in a trivially false if statement unless a cargo feature is enabled

# Objective

- Eliminate significant overhead observed with trace-level logging in render hot loops, even when trace log level is not enabled.
- This is an alternative solution to the one proposed in bevyengine#7223 

## Solution

- Introduce a wrapper around the `trace!` macro called `detailed_trace!`. This macro wraps the `trace!` macro with an if statement that is conditional on a new cargo feature, `detailed_trace`. When the feature is not enabled (the default), then the if statement is trivially false and should be optimized away at compile time.
- Convert the observed hot occurrences of trace logging in `TrackedRenderPass` with this new macro.

Testing the results of 

```
cargo run --profile stress-test --features bevy/trace_tracy --example many_cubes -- spheres
```

![image](https://user-images.githubusercontent.com/1222141/218298552-38551717-b062-4c64-afdc-a60267ac984d.png)

shows significant improvement of the `main_opaque_pass_3d`  of the renderer, a median time decrease from 6.0ms to 3.5ms. 

---

## Changelog

- For performance reasons, some detailed renderer trace logs now require the use of cargo feature `detailed_trace` in addition to setting the log level to `TRACE` in order to be shown.

## Migration Guide

- Some detailed bevy trace events now require the use of the cargo feature `detailed_trace` in addition to enabling `TRACE` level logging to view. Should you wish to see these logs, please compile your code with the bevy feature `detailed_trace`. Currently, the only logs that are affected are the renderer logs pertaining to `TrackedRenderPass` functions
myreprise1 pushed a commit to myreprise1/bevy that referenced this pull request Feb 15, 2023
…7639)

Profiles show that in extremely hot loops, like the draw loops in the renderer, invoking the trace! macro has noticeable overhead, even if the trace log level is not enabled.

Solve this by introduce a 'wrapper' detailed_trace macro around trace, that wraps the trace! log statement in a trivially false if statement unless a cargo feature is enabled

# Objective

- Eliminate significant overhead observed with trace-level logging in render hot loops, even when trace log level is not enabled.
- This is an alternative solution to the one proposed in bevyengine#7223 

## Solution

- Introduce a wrapper around the `trace!` macro called `detailed_trace!`. This macro wraps the `trace!` macro with an if statement that is conditional on a new cargo feature, `detailed_trace`. When the feature is not enabled (the default), then the if statement is trivially false and should be optimized away at compile time.
- Convert the observed hot occurrences of trace logging in `TrackedRenderPass` with this new macro.

Testing the results of 

```
cargo run --profile stress-test --features bevy/trace_tracy --example many_cubes -- spheres
```

![image](https://user-images.githubusercontent.com/1222141/218298552-38551717-b062-4c64-afdc-a60267ac984d.png)

shows significant improvement of the `main_opaque_pass_3d`  of the renderer, a median time decrease from 6.0ms to 3.5ms. 

---

## Changelog

- For performance reasons, some detailed renderer trace logs now require the use of cargo feature `detailed_trace` in addition to setting the log level to `TRACE` in order to be shown.

## Migration Guide

- Some detailed bevy trace events now require the use of the cargo feature `detailed_trace` in addition to enabling `TRACE` level logging to view. Should you wish to see these logs, please compile your code with the bevy feature `detailed_trace`. Currently, the only logs that are affected are the renderer logs pertaining to `TrackedRenderPass` functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Diagnostics Logging, crash handling, error reporting and performance analysis A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times X-Controversial There is active debate or serious implications around merging this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants