Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LTO visibility attribute for wait_context_vertex #1644

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kboyarinov
Copy link
Contributor

Description

Fixes # - issue number(s) if exists

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

List users with @ to send notifications

Other information

solbjorn added a commit to solbjorn/reaper-engine that referenced this pull request Feb 22, 2025
Due to the bug in oneTBB (not really a bug, but with
-fwhole-program-vtables task_group::wait() was exiting without waiting
for anything), I left TTAPI earlier for async tasks. Now that there's
a PR for fixing the issue

uxlfoundation/oneTBB#1644

I managed to get xrEngine working without TTAPI, consistently using
solely oneTBB for parallel processing.
xr_task_group_get() is a simple wrapper which allocates a task_group
from the TLS small pool from oneTBB. This pool is fast, uses recycling,
and returns cacheline-aligned buffers, just what we need here. I think
this feature, which is placed in tbb::detail, is not meant to be used
outsife of oneTBB, but hey, it works :>

These task groups are used for:

1. HOM processing. Previously it was queued in seqParallel, which is not
   a good idea since seqParallel is processed in parallel with the main
   rendering. Now it's just a task fired after the camera pos is
   updated, same as in OXR.
2. Details processing. Similarly to HOM, this is fired once the camera
   pos is updated, which should happen earlier than seqFrame from the
   actor update function.

3. Sun cascades processing. This is now queued from the environment
   update, i.e. from GamePersistent update, as we want to have the
   actual weather params. It's a bit later than OnCameraUpdated() in
   normal cases and right after Render::OnFrame(), so the impact is
   negligible.
4. Rain processing. Same as above.

Bonus:

5. Parallel load of input, sound, and light anims during the
   initialization thanks to OXR (with minor adjustments). Saves
   1+ second between the splash screen and the main menu.

Tracing shows that tg.wait() of all the above flows consume only
several microseconds, which means the queue/sync calls are balanced.

`-max-threads` is still supported, now via oneTBB global control.

Remove all non-MT counterparts to simplify code. Even on low-core
machines this won't hurt. Remove software details processing since
it's explicitly unsupported anyway (the engine won't run).
Initialize/clear occRasterizer (78 Kb) using 32-byte AVX2 writes.

Signed-off-by: Alexander Lobakin <alobakin@mailbox.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant