Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improved wgpu error handling, no more crashes through wgpu validation…
… errors (#4509) ### What * Fixes #4457 * solves the crash part of #4455 * but keeping it open because I still want to check on the error message There's a bunch of things that probably should have been separate PRs but by the time I ventured there it got too hard to entangle, so please bear with me and read this detailed list of changes carefully (if you understand this list you've done half the review ;-)): * re_renderer now uses `cfg_aliases` and `cfg_if` in various places to make cfg decisions more readable * `ErrorTracker` is now used in all build configurations. Previously, it was only active on debug native builds * The first thing `RendererContext` does now is to install global error handlers that feed into the "top level" `ErrorTracker`. Previously this was done delayed and only for debug+native * `ErrorTracker` never panics now * `ErrorTracker` no longer uses a frame counting heuristic (!!) to discard _all_ errors, but *instead* relies on precise device timeline frame counters to discard individual errors * this fixes issues where previously an error would disappear and reappear without being logged * as commented several times in the code: Keep in mind the three staggered timelines of WebGPU * `Content` (your code) * `Device` (everything happening on wgpu::Device) * `Queue` (everything happening on the GPU, represented by wgpu::Queue) * `Content` and `Device` are mostly in sync on wgpu-core, but NOT on WebGPU in general! * split out the wgpu-core specific parts of `ErrorTracker` into its own module and fall back to simple string hashing for WebGPU * on WebGPU we're dealing with a JS interface that doesn't provide these types. Additionally, in a pure WebGPU build (the only webgpu enabled build we have today), we don't even depend on wgpu-core! * Manual test of re_renderer samples shows that we now no longer crash on WebGPU either! * the wgpu-core error type handling is unchanged except for a change in label hashing * previously, in the absence of debug labels, error de-duplication would fail. In practice leading to error spam on release builds * Introduce `WgpuErrorScope` utility for safe & fully covering wgpu error scopes * The resulting futures are handled with `now_or_never` on wgpu-core backend since we know this is safe. On WebGPU we hand off execution to the browser. * Each frame now has a top level `WgpuErrorScope` in order to catch all errors * this is not a _strict_ requirement on wgpu-core backends since wgpu-core will always feed the fallback error handler. **But Chrome/Dawn WebGPU does not**. The spec allows for this behavior <img width="1623" alt="image" src="https://github.com/rerun-io/rerun/assets/1220815/be42c6cc-a780-4f48-b24c-8f9dc36b872a"> Screenshot of a viewer in release mode with an intentionally broken mesh shader. Also tried other breakages like render pipeline format mismatches. Things related, but NOT covered by this PR: * crashes on startup / adapter selection * this requires further improvements to egui: We need to improve the hooks that go into creating the egui render state. Need to consider allowing to create your own renderstate. This would also allow us to move the actual device/queue/adapter ownership to re_renderer * #4507 * punting on this one. It's a nice-to-have extension of the work presented here * close look into #4455 * the crash is almost certainly fixed now and replaced by a recoverable error 🥳, but I want to understand what sets the wrong viewport - either the one in egui we already fixed or the ViewBuilder (which surely could use some safety check) ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Full build: [app.rerun.io](https://app.rerun.io/pr/4509/index.html) * Partial build: [app.rerun.io](https://app.rerun.io/pr/4509/index.html?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) - Useful for quick testing when changes do not affect examples in any way * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG - [PR Build Summary](https://build.rerun.io/pr/4509) - [Docs preview](https://rerun.io/preview/754b7c4109ba687a6192689ef9c8974dd014d770/docs) <!--DOCS-PREVIEW--> - [Examples preview](https://rerun.io/preview/754b7c4109ba687a6192689ef9c8974dd014d770/examples) <!--EXAMPLES-PREVIEW--> - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)
- Loading branch information