Skip to content

Conversation

@aevyrie
Copy link
Member

@aevyrie aevyrie commented Dec 29, 2025

Objective

  • Speed up collect_meshes_for_gpu_building, a bottleneck for scenes with many moving meshes.

Solution

  • Parallelize the gather step for mesh collection.
  • Immediately start up a task for serial collection of meshes, which cannot be parallelized.
  • Spawn many tasks for gathering meshes, and send batches of these to the collection task
  • This allows the serial collection step to start immediately, instead of being delayed until after all collection is finished.

Testing

  • Built a new bevymark_3d stress test for benchmarking dynamic 3d mesh scenes. This is not currently covered by our stress tests. Bevymark 3D #22298
  • With 200k meshes, this drops total frame times from 16.4ms to 12.3ms (-4.1ms)
image
  • Mesh collection itself drops from 7.9ms to 3.6ms (-4.3ms)
image

@aevyrie aevyrie mentioned this pull request Dec 29, 2025
@IceSentry IceSentry added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 29, 2025
@alice-i-cecile alice-i-cecile added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 29, 2025
@alice-i-cecile
Copy link
Member

CI failures are real, but should be easy to fix.

@tychedelia tychedelia self-requested a review December 29, 2025 21:27
github-merge-queue bot pushed a commit that referenced this pull request Dec 30, 2025
# Objective

- Add a stress test that exercises the 3d mesh pipeline for dynamic
scenes.
- Large static scenes like caldera hotel don't expose performance issues
when many meshes are moving.
- Give us a way to benchmark PRs like
   - #22297
   - #22281
   - #22226

## Solution

- Make a 3d version of `bevymark`, sticking to the existing patterns as
closely as possible.

## Testing

<img width="1072" height="684" alt="image"
src="https://github.com/user-attachments/assets/41214ba9-ffad-471d-a320-1f007490dead"
/>

---------

Co-authored-by: Carter Anderson <[email protected]>
@aevyrie
Copy link
Member Author

aevyrie commented Dec 30, 2025

@alice-i-cecile g2g now

@aevyrie aevyrie added this to the 0.18 milestone Dec 30, 2025
@aevyrie
Copy link
Member Author

aevyrie commented Dec 30, 2025

Added to the milestone as it seems about equivalent to my others perf PRs that were also added.

@aevyrie
Copy link
Member Author

aevyrie commented Dec 30, 2025

This PR needs more thorough testing before I'd feel comfortable merging. Parallelizing isn't always a speedup and can increase the total amount of CPU work needed even if throughput increases.

So far, things are still looking promising after my latest round of commits

cargo rer bevymark_3d --features=debug,trace_tracy -- --benchmark --waves 250 --per-wave 1000

comparing this branch to main

frametime

image

collect_meshes_for_gpu_building

image

@alice-i-cecile alice-i-cecile removed this from the 0.18 milestone Jan 1, 2026
Copy link
Contributor

@pcwalton pcwalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks fine, but I think with some different factoring this would be easier to maintain and check. I'm not 100% sure the refactoring is viable, but I'd like to see if we can try.

Copy link
Contributor

@pcwalton pcwalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I love this. This will help so much with making other parts of the system parallel, and addons and apps should be able to use this for increased parallelism too. In fact, it's essentially a big upgrade for the ECS, allowing easy parallelism in situations where par_iter() on a query isn't enough.

Thanks a bunch for taking the time to refactor it!

@aevyrie
Copy link
Member Author

aevyrie commented Jan 2, 2026

Revisiting benches after visibility optimizations merged, the improvements are still reproducible, and overall frametimes are improved thanks to the optimizations on main.

image

@alice-i-cecile alice-i-cecile added S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Jan 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants