Prevent Metal from crashing when a still-open encoder is deallocated, resolve issue #2077. #4023

bradwerth · 2023-08-09T21:15:39Z

Checklist

Run cargo clippy.
Run cargo clippy --target wasm32-unknown-unknown if applicable.
Add change to CHANGELOG.md. See simple instructions inside file.

Connections
None

Description
Metal requires that a MTLCommandEncoder must be closed before it is deallocated, otherwise Metal will assert that endEncoding was not called. This manifests as a crash in CTS tests that might otherwise be failures or timeouts.

Testing
CTS tests that are expected fail on Metal will no longer crash.

…endEncoding

ErichDonGubler

This change makes sense to me; I think the shape of it might change a bit, but using Drop to implment this seems like an elegant way to solve the problem.

If it weren't for the CHANGELOG entry, I'd "only" be leaving a Comment review, not a Request changes review.

CHANGELOG.md

wgpu-hal/src/metal/command.rs

CHANGELOG.md

wgpu-hal/src/metal/command.rs

kpreid

Does this PR fix #2077 and particularly the repro case I described in #2077 (comment) ?

If so, it should be mentioned in the PR description.

CHANGELOG.md

…e changes.

jimblandy

This needs a test case, probably in tests/tests/encoder.rs.

jimblandy · 2023-08-11T17:07:48Z

@ErichDonGubler asked:

Is trying to solve this problem for other platforms something you intend to resolve or track in another place?

As long as the fix is in wgpu-core, the encoders will get closed on all backends, as wgpu-core is simply driving whatever wgpu_hal::Api implementation you gave it.

Perhaps the right place to document this requirement is in wgpu-hal/src/lib.rs. I would like that file to start recording the invariants that wgpu-core works so hard to ensure. @cwfitzgerald, does that sound right to you?

ErichDonGubler · 2023-08-11T17:36:09Z

@jimblandy: I'm confused. Why are we talking about wgpu-core? This fix isn't in wgpu-core; this is a wgpu-hal change specific to the metal module family.

wgpu-hal/src/metal/command.rs

jimblandy · 2023-08-13T03:18:51Z

@jimblandy: I'm confused. Why are we talking about wgpu-core? This fix isn't in wgpu-core; this is a wgpu-hal change specific to the metal module family.

Sorry - Brad had shown me an earlier version of this patch that addressed the problem in generic code. This PR is completely different from that. This is indeed a Metal-only fix.

wgpu-hal/src/metal/command.rs

jimblandy · 2023-08-13T04:41:34Z

It seems like wgpu-hal's Vulkan backend doesn't have any similar requirement: wgpu_hal::vulkan::Device::destroy_command_encoder calls vkDestroyCommandPool, which only seems to require that none of the vkCommandBuffers in the pool be actively in use by the GPU (the "pending" state).

But it's a little hard to follow, which is one reason I'd like to have a test case in this PR that runs on all platforms but reproduces the problem on Metal.

CommandEncoder such that we can call the existing discard_encoding function. Also add a test of dropping a CommandEncoder after it has errored on a command.

bradwerth · 2023-08-16T16:20:07Z

From the failures in the attempted merge, it appears that DX12 is failing the new test and therefore needs a similar intervention. The DX12 error is ID3D12CommandAllocator::Reset: The command allocator cannot be reset because a command list is currently being recorded with the allocator. [ EXECUTION ERROR #543: COMMAND_ALLOCATOR_CANNOT_RESET]. That sounds very much like the Metal issue with failing to call endEncoding. I'll move the Drop implementation further up the class hierarchy so it covers all adapters.

cwfitzgerald

LGTM after nit

wgpu-hal/src/metal/command.rs

cwfitzgerald · 2023-08-16T16:24:37Z

D3D12 isn't happy with this change, unfortunately:

[2023-08-16T13:41:19Z ERROR wgpu_hal::auxil::dxgi::exception] ID3D12CommandAllocator::Reset: The command allocator cannot be reset because a command list is currently being recorded with the allocator. [ EXECUTION ERROR #543: COMMAND_ALLOCATOR_CANNOT_RESET]

bradwerth · 2023-08-16T19:04:30Z

I'll move the Drop implementation further up the class hierarchy so it covers all adapters.

Since all the superclass implementations were empty, I just re-implemented it for DX12.

ErichDonGubler

LGTM, minus Connor's concern.

wgpu-hal/src/dx12/command.rs

bradwerth · 2023-08-16T23:26:50Z

I'm not confident I'll be able to resolve the DX12 test failure with my setup in a reasonable time. I think I will disable the new test for DX12 in this PR, and treat it as a follow-up issue for somebody who is in a better position to fix it.

jimblandy

Looks fantastic.

Please file a follow-up issue for fixing the test on DX12. You can just leave it for others to triage.

jimblandy · 2023-08-29T00:36:41Z

There's an unrelated timeout on mach aarch64. I'm going to try re-running, and if it goes away we'll rebase and merge this.

jimblandy · 2023-08-29T01:23:04Z

I filed #4096 for the aarch64 timeout.

bradwerth added 5 commits August 9, 2023 13:46

This prevents Metal from crashing when an encoder is deallocated.

75b5056

Add note to CHANGELOG.md.

7ce2f9a

Add note to CHANGELOG.md.

baf3fa3

Relocate additions in CHANGELOG.md.

7e21d4c

Merge branch 'endEncoding' of https://github.com/bradwerth/wgpu into …

d896931

…endEncoding

ErichDonGubler self-assigned this Aug 9, 2023

ErichDonGubler requested changes Aug 9, 2023

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

wgpu-hal/src/metal/command.rs Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

wgpu-hal/src/metal/command.rs Show resolved Hide resolved

kpreid reviewed Aug 9, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

bradwerth added 2 commits August 10, 2023 15:00

Fix changelog and add code comment describing the motivation for thes…

db5cb6d

…e changes.

Merge branch 'trunk' into endEncoding

386eacc

bradwerth changed the title ~~Prevent Metal from crashing when a still-open encoder is deallocated.~~ Prevent Metal from crashing when a still-open encoder is deallocated, resolve issue #2077. Aug 11, 2023

jimblandy requested changes Aug 11, 2023

View reviewed changes

ErichDonGubler reviewed Aug 11, 2023

View reviewed changes

wgpu-hal/src/metal/command.rs Outdated Show resolved Hide resolved

jimblandy reviewed Aug 13, 2023

View reviewed changes

wgpu-hal/src/metal/command.rs Outdated Show resolved Hide resolved

jimblandy requested changes Aug 13, 2023

View reviewed changes

wgpu-hal/src/metal/command.rs Outdated Show resolved Hide resolved

Instead of implementing Drop for CommandState, implement it for

0425f63

CommandEncoder such that we can call the existing discard_encoding function. Also add a test of dropping a CommandEncoder after it has errored on a command.

bradwerth requested a review from jimblandy August 15, 2023 15:30

Merge branch 'trunk' into endEncoding

00411bc

cwfitzgerald reviewed Aug 16, 2023

View reviewed changes

wgpu-hal/src/metal/command.rs Outdated Show resolved Hide resolved

teoxoy mentioned this pull request Aug 16, 2023

Early frees on CPU Implementations #3193

Closed

bradwerth added 3 commits August 16, 2023 11:55

Implement Drop for DX12 CommandEncoders to also call discard_encoding.

a4eaa94

Update Changelog with notes about DX12 changes.

c4d5691

Merge branch 'trunk' into endEncoding

3c44ed7

ErichDonGubler self-requested a review August 16, 2023 19:47

ErichDonGubler reviewed Aug 16, 2023

View reviewed changes

wgpu-hal/src/dx12/command.rs Outdated Show resolved Hide resolved

bradwerth added 3 commits August 16, 2023 14:20

Fix DX12 compilation problems, attempt 1.

9f55d32

Merge branch 'trunk' into endEncoding

2cbb242

Update comment with the narrower finding of the endEncoding requirement.

59446b8

bradwerth and others added 7 commits August 16, 2023 17:00

Mark the drop_encoder_after_error test as failing on DX12.

305fd55

Cleanup of some should-be-left-untouched DX12 implementation stuff.

64a029f

Merge branch 'trunk' into endEncoding

dd53269

Merge branch 'trunk' into endEncoding

7215c93

Merge branch 'trunk' into endEncoding

bdb4515

Merge branch 'trunk' into endEncoding

e3afadc

CHANGELOG.md: Avoid duplicating "Bug Fixes" section

3e2ec42

jimblandy approved these changes Aug 28, 2023

View reviewed changes

jimblandy merged commit 5c2c840 into gfx-rs:trunk Aug 29, 2023
20 checks passed

bradwerth deleted the endEncoding branch August 30, 2023 21:49

jimblandy mentioned this pull request Jan 23, 2024

Command encoder released without endEncoding #2077

Closed

ErichDonGubler mentioned this pull request Feb 14, 2024

Free command encoders' platform resources on drop on _all_ platforms #5251

Merged

6 tasks

Binpuki mentioned this pull request Aug 12, 2024

No Godot v4.3 build will launch any projects on macOS Sonoma (OpenCore) godotengine/godot#95226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent Metal from crashing when a still-open encoder is deallocated, resolve issue #2077. #4023

Prevent Metal from crashing when a still-open encoder is deallocated, resolve issue #2077. #4023

bradwerth commented Aug 9, 2023 •

edited

Loading

ErichDonGubler left a comment

kpreid left a comment

jimblandy left a comment

jimblandy commented Aug 11, 2023 •

edited

Loading

ErichDonGubler commented Aug 11, 2023

jimblandy commented Aug 13, 2023

jimblandy commented Aug 13, 2023 •

edited

Loading

bradwerth commented Aug 16, 2023

cwfitzgerald left a comment

cwfitzgerald commented Aug 16, 2023 •

edited

Loading

bradwerth commented Aug 16, 2023

ErichDonGubler left a comment

bradwerth commented Aug 16, 2023

jimblandy left a comment

jimblandy commented Aug 29, 2023

jimblandy commented Aug 29, 2023

Prevent Metal from crashing when a still-open encoder is deallocated, resolve issue #2077. #4023

Prevent Metal from crashing when a still-open encoder is deallocated, resolve issue #2077. #4023

Conversation

bradwerth commented Aug 9, 2023 • edited Loading

ErichDonGubler left a comment

Choose a reason for hiding this comment

kpreid left a comment

Choose a reason for hiding this comment

jimblandy left a comment

Choose a reason for hiding this comment

jimblandy commented Aug 11, 2023 • edited Loading

ErichDonGubler commented Aug 11, 2023

jimblandy commented Aug 13, 2023

jimblandy commented Aug 13, 2023 • edited Loading

bradwerth commented Aug 16, 2023

cwfitzgerald left a comment

Choose a reason for hiding this comment

cwfitzgerald commented Aug 16, 2023 • edited Loading

bradwerth commented Aug 16, 2023

ErichDonGubler left a comment

Choose a reason for hiding this comment

bradwerth commented Aug 16, 2023

jimblandy left a comment

Choose a reason for hiding this comment

jimblandy commented Aug 29, 2023

jimblandy commented Aug 29, 2023

bradwerth commented Aug 9, 2023 •

edited

Loading

jimblandy commented Aug 11, 2023 •

edited

Loading

jimblandy commented Aug 13, 2023 •

edited

Loading

cwfitzgerald commented Aug 16, 2023 •

edited

Loading