Handle Multi-threaded EGL Context Access #1729

zicklag · 2021-07-27T01:41:51Z

Connections
#1630, bevyengine/bevy#841

Description
Implements the synchronization necessary to use the GL backend from multiple threads. Accomplishes this by using a mutex around the GL context with extra wrapping to bind and unbind the EGL context when locking and unlocking.

Testing
Tested on Ubunty 20.04 with a fork of the Bevy game engine and the WGPU examples ( not that the examples test the multi-threading ).

Remaining Issues

There is only one Bevy example I cannot get to run yet and it's the load_gltf example. It fails with a shader translation error:

Jul 26 20:36:50.949 ERROR naga::back::glsl: Conflicting samplers for _group_3_binding_10    
Jul 26 20:36:50.950  WARN wgpu::backend::direct: Shader translation error for stage FRAGMENT: A image was used with multiple samplers    
Jul 26 20:36:50.950  WARN wgpu::backend::direct: Please report it to https://github.com/gfx-rs/naga    
Jul 26 20:36:50.950 ERROR wgpu::backend::direct: wgpu error: Validation Error

Caused by:
    In Device::create_render_pipeline
    Internal error in FRAGMENT shader: A image was used with multiple samplers

Interestingly, I think the shader in question doesn't have a group(3), binding(10) anywhere that I know of so I'm going to have to drill down a bit more and find out exactly which shader translation is failing more.

This could potentially be fixed in a separate PR. I think the rest of this PR is rather straight-forward and the fix for the error above is probably mostly unrelated to the primary changes made in this PR.

wgpu-hal/src/gles/queue.rs

cwfitzgerald

I mean, I hate it, but seems mostly reasonable. I want to test this on stuff like rpi and android to see if it works.

zicklag · 2021-07-27T02:19:31Z

I mean, I hate it, but seems mostly reasonable.

Yeah, according to the EGL spec we can't access the context in multiple threads at a time so essentially we're stuck. 🤷‍♂️

Oh, something I forgot. I don't know how to test it, but according to the EGL spec I think the dummy pbuffer that we use as a workaround when surfaceless rendering isn't available won't work in a multi-threaded environment. You're not allowed to have multiple threads that bind the same current pbuffer I don't think, and that's what would happen if we didn't have surfaceless support and you tried to access the context from multiple threads.

We may need to try to detect that and maybe add a downlevel flag or an error message or something.

Edit: We may also want to create a multi-threaded example sometime. Currently my only test-case for this is a fork of Bevy updated for WGPU on master.

cwfitzgerald · 2021-07-27T06:43:49Z

We already do have a multithreaded example, it's cargo test --example hello-compute test_multithreaded_compute. Additionally, the tests are multi-threaded unless --test-threads=1 is passed.

Would having thread-local pbuffers work?

I tested this PR on my intel/haswell machine using the GL tests, and got tons of errors:

[2021-07-27T06:36:21Z ERROR wgpu_hal::gles::egl] EGL 'eglMakeCurrent' code 0x3009: Got an EGLSurface but no EGLContext

jakobhellermann · 2021-07-27T11:40:10Z

The shader translation error is probably due to gfx-rs/naga#1137.

kvark

I like this approach quite a bit.
Previously in gfx, we had a wrapper around GL content that would just make it current without any locks. It was quite painful.
Here, we are locking the context and guaranteeing safe access.

wgpu-hal/src/gles/device.rs

wgpu-hal/src/gles/egl.rs

zicklag · 2021-07-27T16:47:23Z

Would having thread-local pbuffers work?

I think so.

I tested this PR on my intel/haswell machine using the GL tests, and got tons of errors:

Does that machine support surfaceless EGL? If it doesn't, then it's probably the pbfuffer problem. I'll try out thread local pbuffers and then you could test it to see if that fixes it.

cargo test --example hello-compute test_multithreaded_compute

That one fails on master with the expected "DeviceLost" that you get when creating a buffer on a different thread. It still fails with this PR, but with a different message. Not sure why:

---- tests::test_multithreaded_compute stdout ----
thread 'tests::test_multithreaded_compute' panicked at 'UNEXPECTED TEST PASS: BACKEND', wgpu/examples/hello-compute/../../tests/common/mod.rs:299:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

cwfitzgerald · 2021-07-27T16:54:06Z

Does that machine support surfaceless EGL?

It definitely should considering it's running on mesa. Is there a particular extension I should be looking out for?

That one fails on master with the expected "DeviceLost" that you get when creating a buffer on a different thread. It still fails with this PR, but with a different message. Not sure why:

We had a test that used to fail on vulkan but now succeeds but we never marked the test to pass. #1731 fixed it.

zicklag · 2021-07-27T16:58:39Z

Is there a particular extension I should be looking out for?

Yeah, in the info log you should see EGL context: +surfaceless:

zicklag · 2021-07-27T17:03:42Z

We had a test that used to fail on vulkan but now succeeds but we never marked the test to pass. #1731 fixed it.

For some reason the test still fails after rebasing.

$ WGPU_BACKEND=gl cargo test --example hello-compute test_multithreaded_compute
running 1 test
test tests::test_multithreaded_compute ... FAILED

failures:

---- tests::test_multithreaded_compute stdout ----
thread 'tests::test_multithreaded_compute' panicked at 'UNEXPECTED TEST PASS: BACKEND', wgpu/examples/hello-compute/../../tests/common/mod.rs:299:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::test_multithreaded_compute

cwfitzgerald · 2021-07-27T17:06:03Z

Oh I misread what was causing it. This is because it now passes on your machine but is still marked to fail on OpenGL. You should got to the test definition wgpu/example/hello-compute/tests.rs and un-mark it as failing.

EGL context: +surfaceless

Yeah this doesn't support that, it only supports EGL 1.4. In fact all my GL devices are only EGL 1.4. That being said, it does support EGL_KHR_surfaceless_context, so we might need to use that.

zicklag · 2021-07-27T17:28:38Z

Oh I misread what was causing it. This is because it now passes on your machine but is still marked to fail on OpenGL. You should got to the test definition wgpu/example/hello-compute/tests.rs and un-mark it as failing.

Ah, I get it now. 👍

Yeah this doesn't support that, it only supports EGL 1.4. In fact all my GL devices are only EGL 1.4. That being said, it does support EGL_KHR_surfaceless_context, so we might need to use that.

Well if it supports the extension then we shouldn't have a problem. Are you not seeing the +surfaceless in the logs? Maybe there's something wrong with our check for the extension?

wgpu/wgpu-hal/src/gles/egl.rs

Lines 331 to 343 in 9ce884c

    
           let pbuffer = 
        
               if version < (1, 5) || !display_extensions.contains("EGL_KHR_surfaceless_context") { 
        
                   let attributes = [egl::WIDTH, 1, egl::HEIGHT, 1, egl::NONE]; 
        
                   egl.create_pbuffer_surface(display, config, &attributes) 
        
                       .map(Some) 
        
                       .map_err(|e| { 
        
                           log::warn!("Error in create_pbuffer_surface: {:?}", e); 
        
                           crate::InstanceError 
        
                       })? 
        
               } else { 
        
                   log::info!("\tEGL context: +surfaceless"); 
        
                   None 
        
               };

The shader translation error is probably due to gfx-rs/naga#1137.

That fixed it. Now all the Bevy examples work!

cwfitzgerald · 2021-07-27T18:18:19Z

Yeah the check is wrong. that should be && not ||.

zicklag · 2021-07-27T18:26:42Z

Oh, duh. 🤦‍♂️ Well, it looks like we found a way to test the pbuffer workaround. :)

I'll fix that check and push an update, then it should work for you on your EGL 1.4 devices.

Then I'll also experiment to see if I can get the pbuffer workaround working by adding || true to that check so I can pretend I don't have the required version/extensions on my machine.

zicklag · 2021-07-27T18:47:24Z

OK I fixed the surfaceless check and I also fixed the error you were getting earlier with "Got an EGLSurface but no EGLContext". When I made the context not current I was not supposed to bind the pbuffer as the current surface.

Also the pbuffer workaround works totally fine with multi-threading so we won't need thread-local pbuffers or anything. I was misinterpreting when we would need to bind the pbuffer. Since we only bind the pbuffer when we bind the context, everything is fine and it is only ever bound by one thread.

If this works on your hardware @cwfitzgerald, I think this should be ready ( pending re-review ).

wgpu-hal/src/gles/egl.rs

wgpu-hal/src/gles/queue.rs

Implements the synchronization necessary to use the GL backend from multiple threads.

zicklag · 2021-07-27T19:03:57Z

Pushed an update!

kvark

just pending on @cwfitzgerald

cwfitzgerald · 2021-07-27T19:25:29Z

Alright, so all tests pass as long as I pass in --test-threads=1. I think the problem may be stemming from having multiple adapters open at once. I need to test my raspberry pi, but I think that is an issue we can deal with in a follow up PR.

zicklag · 2021-07-27T19:28:38Z

Alright, so all tests pass as long as I pass in --test-threads=1.

Interesting. All the tests pass for me, even without the --test-threads=1. I wonder what the difference is.

I need to test my raspberry pi, but I think that is an issue we can deal with in a follow up PR.

👍

kvark · 2021-07-27T19:28:42Z

uh, isn't this a bit worrying? The whole point of the PR is to support more than one thread.
Would you have energy to investigate? My time is getting low now since the move is nigh...

cwfitzgerald · 2021-07-27T19:31:27Z

The whole point of the PR is to support more than one thread.

It does, the multi-threaded example works perfectly fine, it's juggling multiple adapters which is the issue (which is what the test harness is doing without --test-threads=1). So I'm comfortable trying to deal with this in a follow up.

Would you have energy to investigate?

Yeah ofc!

kvark · 2021-07-27T19:42:58Z

ok, sounds reasonable
bors r=cwfitzgerald,kvark

bors · 2021-07-27T19:50:04Z

Build succeeded:

zicklag · 2021-07-27T19:54:04Z

Awesome! Thanks guys! I'm super excited to actually have gotten Bevy running on OpenGL. 😃

cwfitzgerald · 2021-07-27T19:54:43Z

Thank you for all the help getting it working!

cwfitzgerald reviewed Jul 27, 2021

View reviewed changes

wgpu-hal/src/gles/queue.rs Outdated Show resolved Hide resolved

cwfitzgerald reviewed Jul 27, 2021

View reviewed changes

zicklag force-pushed the multi-threaded-gl branch from 5a45663 to 5a21494 Compare July 27, 2021 02:14

kvark reviewed Jul 27, 2021

View reviewed changes

zicklag force-pushed the multi-threaded-gl branch from 5a21494 to e47c43a Compare July 27, 2021 16:46

zicklag force-pushed the multi-threaded-gl branch from e47c43a to 00d1c17 Compare July 27, 2021 17:02

zicklag force-pushed the multi-threaded-gl branch from 00d1c17 to 675592f Compare July 27, 2021 17:30

zicklag force-pushed the multi-threaded-gl branch 2 times, most recently from f4db755 to af23509 Compare July 27, 2021 18:36

zicklag requested review from cwfitzgerald and kvark July 27, 2021 18:47

kvark reviewed Jul 27, 2021

View reviewed changes

wgpu-hal/src/gles/egl.rs Outdated Show resolved Hide resolved

wgpu-hal/src/gles/egl.rs Outdated Show resolved Hide resolved

wgpu-hal/src/gles/egl.rs Outdated Show resolved Hide resolved

wgpu-hal/src/gles/queue.rs Outdated Show resolved Hide resolved

Handle Multi-threaded EGL Context Access

671e393

Implements the synchronization necessary to use the GL backend from multiple threads.

zicklag force-pushed the multi-threaded-gl branch from af23509 to 671e393 Compare July 27, 2021 19:02

kvark approved these changes Jul 27, 2021

View reviewed changes

bors bot merged commit 451cd21 into gfx-rs:master Jul 27, 2021

zicklag deleted the multi-threaded-gl branch July 27, 2021 19:52

zicklag mentioned this pull request Jul 27, 2021

OpenGL support bevyengine/bevy#841

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Multi-threaded EGL Context Access #1729

Handle Multi-threaded EGL Context Access #1729

zicklag commented Jul 27, 2021 •

edited

Loading

cwfitzgerald left a comment

zicklag commented Jul 27, 2021 •

edited

Loading

cwfitzgerald commented Jul 27, 2021

jakobhellermann commented Jul 27, 2021

kvark left a comment

zicklag commented Jul 27, 2021

cwfitzgerald commented Jul 27, 2021

zicklag commented Jul 27, 2021

zicklag commented Jul 27, 2021

cwfitzgerald commented Jul 27, 2021

zicklag commented Jul 27, 2021 •

edited

Loading

cwfitzgerald commented Jul 27, 2021

zicklag commented Jul 27, 2021

zicklag commented Jul 27, 2021

zicklag commented Jul 27, 2021

kvark left a comment

cwfitzgerald commented Jul 27, 2021

zicklag commented Jul 27, 2021

kvark commented Jul 27, 2021

cwfitzgerald commented Jul 27, 2021

kvark commented Jul 27, 2021

bors bot commented Jul 27, 2021

zicklag commented Jul 27, 2021

cwfitzgerald commented Jul 27, 2021

Handle Multi-threaded EGL Context Access #1729

Handle Multi-threaded EGL Context Access #1729

Conversation

zicklag commented Jul 27, 2021 • edited Loading

Remaining Issues

cwfitzgerald left a comment

Choose a reason for hiding this comment

zicklag commented Jul 27, 2021 • edited Loading

cwfitzgerald commented Jul 27, 2021

jakobhellermann commented Jul 27, 2021

kvark left a comment

Choose a reason for hiding this comment

zicklag commented Jul 27, 2021

cwfitzgerald commented Jul 27, 2021

zicklag commented Jul 27, 2021

zicklag commented Jul 27, 2021

cwfitzgerald commented Jul 27, 2021

zicklag commented Jul 27, 2021 • edited Loading

cwfitzgerald commented Jul 27, 2021

zicklag commented Jul 27, 2021

zicklag commented Jul 27, 2021

zicklag commented Jul 27, 2021

kvark left a comment

Choose a reason for hiding this comment

cwfitzgerald commented Jul 27, 2021

zicklag commented Jul 27, 2021

kvark commented Jul 27, 2021

cwfitzgerald commented Jul 27, 2021

kvark commented Jul 27, 2021

bors bot commented Jul 27, 2021

zicklag commented Jul 27, 2021

cwfitzgerald commented Jul 27, 2021

zicklag commented Jul 27, 2021 •

edited

Loading

zicklag commented Jul 27, 2021 •

edited

Loading

zicklag commented Jul 27, 2021 •

edited

Loading