Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Sprite Batching #2642

Closed

Conversation

zicklag
Copy link
Member

@zicklag zicklag commented Aug 12, 2021

This still needs a little bit of work, but not much. On my machine I get get up to 83,000 sprites before droping below 60 fps, compared to before batching where it only got up to 33,000. 🎉


Objective

  • Improve sprite rendering performance

Solution

  • Batch sprite draw calls together based on sprites that have the same texture

@kirawi
Copy link

kirawi commented Aug 14, 2021

I'm curious as to how well it does with the render rework.

Edit: Nevermind, it is with the render rework, haha.

@alice-i-cecile alice-i-cecile added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Aug 16, 2021
@alice-i-cecile alice-i-cecile added this to the Bevy 0.6 milestone Aug 16, 2021
@alice-i-cecile
Copy link
Member

I've added this to the 0.6 milestone; it would be very good to finally be able to improve this with the new renderer.

@zicklag
Copy link
Member Author

zicklag commented Aug 19, 2021

@alice-i-cecile , great! It should be almost ready, but I wanted to do a once-over before I marked it as ready for review, I should get to that sometime within a week.

@zicklag zicklag marked this pull request as ready for review August 20, 2021 02:58
@zicklag
Copy link
Member Author

zicklag commented Aug 20, 2021

I fixed the last of the bugs that I know of and I this should now be ready for review!

@alice-i-cecile
Copy link
Member

I've done a review pass on this and it looks solid to me. Clear, well-organized and no obviously questionable design choices. I'm no rendering expert though, so take that with a grain of salt.

Do we have a strategy for automated testing or benchmarking of rendering features like this set up?

@zicklag zicklag force-pushed the sprite-instancing branch 2 times, most recently from 031c7ab to a55ae25 Compare August 20, 2021 16:27
Sprites with the same texture will now be rendered with one draw call.
@zicklag
Copy link
Member Author

zicklag commented Aug 30, 2021

Pushed an extra commit that implements proper depth sorting and sprite flipping with only slight changes to the previous commit.

Do we have a strategy for automated testing or benchmarking of rendering features like this set up?

Not that I know of.

@cart
Copy link
Member

cart commented Sep 1, 2021

Just wrapped up a first pass review on this and it looks good to me generally. I'm gonna do one more pass tomorrow, then I'm gonna try to sort out the best way to reconcile these changes with the custom-shaders branch (ex: which to merge first). I think its probably easier to just merge this first and then adapt on my side, but I guess we'll see :)

@cart
Copy link
Member

cart commented Oct 28, 2021

We have most of the renderer foundations laid for 0.6 and I'm starting the "update and merge as many open features as possible" phase. How do you want to handle this pr?

  • You update in-place by resolving merge conflicts
  • I update in-place by resolving merge conflicts and creating a new pr (which would preserve your commit authorship). I can't push to your branch so i cant make changes to this pr directly.
  • One of us creates a new fresh pr

I think starting from scratch might actually be easier than resolving conflicts everywhere. But I'll defer to you here (and ill default to resolving merge conflicts with this code and creating a new pr if i dont hear back from you).

@zicklag
Copy link
Member Author

zicklag commented Oct 29, 2021

I'm totally fine with you just creating a new PR. It's no big deal to me on authorship or anything. I got in some good learning on this PR and if the renderer's changed as much to make it easier to create a new PR, I'm totally fine if you go ahead with that.

I'll probably not have time to invest in making a new one super soon so it's probably more timely for you to do it if that's not far out of your way.

@cart
Copy link
Member

cart commented Oct 29, 2021

Cool sounds good to me!

@inodentry
Copy link
Contributor

I'd just like to point out that this PR is a little misleading. It says "batching", but the code actually implements instancing, not batching. Reading through the source code, I see that it draws the sprites with an instanced draw call.

From all the research about the topic that I've done so far, instancing has been suggested as not optimal for sprites (because of low GPU occupancy, as the mesh is just a quad). Batching can supposedly be expected to have better performance.


For anyone who doesn't know what the difference is:

Instancing is a "hardware" technique, where you tell the GPU to draw many copies (instances) of the same mesh. You prepare and upload just the one basic mesh to the GPU. Every instance can have different uniforms / per-instance data inputs (like different transforms) for the shader.

Batching is a "software" technique, where the data from all the objects is combined/merged together into one big mesh, on the CPU side, and then just rendered with a regular draw call. From the POV of the GPU, it is working with just one big mesh.

With batching, you have more data to move around, and you have to compute transforms and such on the CPU. With instancing, the vertex shader can do the matrix multiplies on the GPU. However, with tiny meshes (like our quads), the GPU hardware is underutilized (because every instance is a separate wavefront/warp) and these aren't too many computations, so the CPU can do them pretty fast. AFAIK, the fastest sprite renderers are batched, not instanced.


I just wanted to point this out. It's up to @cart what to do. If this PR is going to be reimplemented anyway, maybe it's worth trying to implement batching instead? Or if it's easier, to not delay 0.6 further, I guess you can just do this instancing implementation. It's already a big improvement, compared to not having it. Maybe we can just have that for now, and explore the possibility of further improving sprite rendering performance with batching later, in the future.

Just please, if we have an instanced implementation, call it "instancing", not "batching", in the new PR and in the release notes, as to not be misleading about what it actually is.

@zicklag
Copy link
Member Author

zicklag commented Oct 29, 2021

Hi @inodentry thanks for the breakdown!

I wasn't aware of the difference. Interesting that batching would be faster than instancing. My naive perspective was that we were going to gain performance be avoiding all of the extra vertex attribute data. Honestly batching sounds no harder to implement than instancing, though.

Anyway, thanks again for the info. That's good to know.

@inodentry
Copy link
Contributor

inodentry commented Oct 29, 2021

My naive perspective was that we were going to gain performance be avoiding all of the extra vertex attribute data.

That's certainly true when you have meshes with more vertices. However, for a quad, even with all the extra data duplication, the overhead is pretty small.

GPU compute/shader cores execute workloads (like the vertex and fragment shader) in groups of many threads that must execute in lock-step (these are called "wavefronts" by AMD and "warps" by NVIDIA). Typically that means 32 threads (or sometimes 64, depending on the GPU architecture), that must all be executing identical instructions on every clock cycle. If the workload doesn't fit, the remaining processing potential of the GPU is left unutilized. An instance with just 4 vertices is obviously much smaller.

@cart
Copy link
Member

cart commented Nov 5, 2021

Closing in favor of #3060

@cart cart closed this Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants