Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BatchedMesh Example much slower on WebGPU than WebGL on Android #29580

Open
Makio64 opened this issue Oct 7, 2024 · 10 comments
Open

BatchedMesh Example much slower on WebGPU than WebGL on Android #29580

Makio64 opened this issue Oct 7, 2024 · 10 comments

Comments

@Makio64
Copy link
Contributor

Makio64 commented Oct 7, 2024

Description

On Android ( Samsung Galaxy S20 FE ) BatchedMesh Example WebGPU is much slower :

WebGPU : ~13FPS
WebGL : ~25FPS

Screenshot_20241007_185610_Chrome

Screenshot_20241007_185559_Chrome

Reproduction steps

  1. load on Android https://threejs.org/examples/?q=bat#webgpu_mesh_batch
  2. enabled/disable WebGPU

Code

Live example

``

Screenshots

No response

Version

r169

Device

Mobile

Browser

Chrome

OS

Android

@RenaudRohlinger
Copy link
Collaborator

RenaudRohlinger commented Oct 8, 2024

The multiDrawAPI isn't currently supported in WebGPU, which is why a single multi-draw call with 20,000 batched elements performs significantly better in WebGL, especially on smartphones.

However, there’s good news! A new MultiDrawIndirect API is on the horizon for WebGPU, which is expected to surpass the performance of the WebGL version:
gpuweb/gpuweb#1354 (comment)
https://issues.chromium.org/issues/369246557/dependencies

This API is already available in Chrome Canary behind the chromium-experimental-multi-draw-indirect flag, enabled through enable-unsafe-webgpu. I plan to begin working with it over the coming weeks, as multi-draw is an important part of my workflow.

In the meantime, as discussed in this PR, we can implement a workaround using multiple drawIndirect() calls with a single indirect buffer, mapped at different offsets for each draw alongside Render Bundles. This approach can mimic the upcoming MultiDrawIndirect API until it becomes widely available: #29197 (comment)

For now, I’ll wait for @Spiri0's work on implementing drawIndirect that looks very promising, which will provide a solid base for that work:
#29568 (comment)

@RenaudRohlinger
Copy link
Collaborator

@mwyrzykowski Just a heads-up: there’s currently an issue in the official Three.js BatchedMesh WebGPU example where setting the count above 1024 causes a break in the WebGPU backend of Safari. I tested this on the latest Safari Technology Preview.
https://threejs.org/examples/?q=batch#webgpu_mesh_batch

The error:
[Log] GPUDeviceLostInfo {reason: "unknown", message: ""}

@mwyrzykowski
Copy link

@mwyrzykowski Just a heads-up: there’s currently an issue in the official Three.js BatchedMesh WebGPU example where setting the count above 1024 causes a break in the WebGPU backend of Safari. I tested this on the latest Safari Technology Preview. https://threejs.org/examples/?q=batch#webgpu_mesh_batch

The error: [Log] GPUDeviceLostInfo {reason: "unknown", message: ""}

Oh thank you for the report @RenaudRohlinger. Do you know which Mac you tried? I tried an M2 Mac Studio with STP 207 with 17788 instances:
Screenshot 2024-11-14 at 10 13 51 AM

might very well be Mac related.

In any case, the performance is really bad, so at the very least I will investigate that until I can figure out how to reproduce.

@Spiri0
Copy link
Contributor

Spiri0 commented Nov 14, 2024

I've been working with the drawIndirect since we got it in r170. This works quite well but it will be more comfortable to use it with structs

const drawBufferStruct = struct({
   vertexCount: 'uint',
   instanceCount: 'uint',
   firstVertex: 'uint',
   firstInstance: 'uint',
});

The values ​​can then be accessed more clearly in Fn and wgslFn

drawBuffer.vertexCount = vertexCount;
drawBuffer.instanceCount = instanceCount;

instead of:
drawBuffer.x = vertexCount;
drawBuffer.y = instanceCount;
like now

This means that uniforms can be bundled efficently by userside to handle them easier in shaders. Especially if you want to bundle a lot of different parameters from each instance in one or few buffer arrays. I already have it working, but now I have to implement it more cleanly. Let's see if I can make it to r171. My job is currently taking a bit more of my time, but I'm just as motivated to round out the drawIndirect topic with structs, so that it can be used to its full potential.

@Makio64
Copy link
Contributor Author

Makio64 commented Nov 15, 2024

https://threejs.org/examples/?q=batch#webgpu_mesh_batch

On my M3 Pro Max I dont crash at 20k instance on safari but im at 1fps.. when 120fps on chrome on the same machine @RenaudRohlinger @mwyrzykowski

@RenaudRohlinger
Copy link
Collaborator

RenaudRohlinger commented Nov 15, 2024

Looks promising @Spiri0, sorry for hijacking this issue by the way. 😬

@mwyrzykowski Thanks for looking into it! I'm using a Macbook Pro M1 Max from 2021 with Safari 207 and Sequoia 15.1.
image

@RenaudRohlinger
Copy link
Collaborator

Awesome @mwyrzykowski! Performance remained stable during profiling with an instance count of 512, but when I slightly increase it—say, around 600—I occasionally encounter Unhandled Promise Rejection: RangeError: Range consisting of offset and length are out of bounds in Safari, often right before a crash.
Screenshot 2024-11-15 at 10 56 49

@Spiri0
Copy link
Contributor

Spiri0 commented Nov 16, 2024

@RenaudRohlinger I have a codePen here on how to use the drawIndirect buffer in conjunction with compute shaders. However, in accordance with If you feel like it, you can convert the shaders to TSL and turn it into an example because it also shows how to use drawIndirect with storage buffers, which will actually always be the case just like using it with compute shaders. If you don't feel like it, no problem then I will do it after the struct expansion.
https://codepen.io/Spiri0/pen/PoMBvzz

With a few more buffers you can control exactly which instances should be visible and which should not, but that would be the topic for another example with structs

P.S. sorry for hijacking this issue too 😅
But this issue already touches the drawIndirect topic so much that this can soon be made more efficient.

@Spiri0
Copy link
Contributor

Spiri0 commented Nov 17, 2024

@RenaudRohlinger I have a question about tsl / Fn and you know it better than me. So far I've only used wgslFn. You also have a forum account right? That would be more appropriate to discuss than using the issue for secondary topics.

@RenaudRohlinger
Copy link
Collaborator

@Spiri0 Sure! https://discourse.threejs.org/u/yakuno 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants