Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][UR] Use Windows proxy loader for UR #15262

Merged
merged 25 commits into from
Sep 6, 2024

Conversation

callumfare
Copy link
Contributor

@callumfare callumfare commented Sep 3, 2024

The issues with DLLs and teardown of global objects on Windows is well documented, and was the reason for the use of the pi_win_proxy_loader library in SYCL-RT. When we ported from PI to UR, we ported this library (it's now called ur_win_proxy_loader), but it was not actually fully utilized. SYCL-RT still linked with ur_loader.dll and still experienced issues with race conditions in the teardown of SYCL-RT and Unified Runtime. See #14768.

This PR reintroduces the proxy loader as it was previously used with PI. The UR loader (ur_loader.dll) is loaded via LoadLibraryEx at initialization, and is therefore not cleaned up too early for normal teardown to occur.

This necessitates changing the signature of Plugin->call to look like it did with PI, taking an enum template argument to specify which UR entry point to call.

On Windows, when each plugin (which is a wrapper over a UR adapter) is loaded, it populates a table of function pointers to each API entry point in the UR loader. When UR entry points are called, the function pointer is retrieved from the table. This is more or less equivalent to the previous PI implementation.

On Linux, the UR loader is dynamically linked as before. The Plugin->call methods just use the regular UR functions rather than programmatically looking up the symbols.

For the unittest executables, the UR loader is still dynamically linked as before to avoid having to introduce noisy changes to the tests, and since we aren't concerned about teardown issues there.

The implementation of these changes in the runtime should avoid as much overhead as possible (and be no worse than PI), but suggestions on how to improve and tidy things are more than welcome.

Associated UR change: oneapi-src/unified-runtime#2045

@callumfare callumfare changed the title WIP: Use Windows proxy loader for UR [SYCL][UR] Use Windows proxy loader for UR Sep 3, 2024
@callumfare callumfare marked this pull request as ready for review September 3, 2024 16:27
@callumfare callumfare requested review from a team as code owners September 3, 2024 16:27
Copy link
Contributor

@cppchedy cppchedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - detail/bindless_images.cpp

@callumfare
Copy link
Contributor Author

The E2E failure was just in the cleanup, the tests actually passed. However, the Windows E2E run took over an hour which is a lot slower than normal. This PR is expected to introduce a very small overhead but nothing like a 50% slowdown. I'll investigate, but this shouldn't be merged in the meantime.

On further investigation it seems like this was a change we made to shutdown_win causing hangs. I've reverted that part of the PR, and also not re-enabled any new E2E tests to be conservative. Assuming CI is ok I think this is ok to merge.

Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just one nit.

sycl/ur_win_proxy_loader/ur_win_proxy_loader.cpp Outdated Show resolved Hide resolved
@callumfare
Copy link
Contributor Author

@intel/llvm-gatekeepers Please merge when possible

@martygrant martygrant merged commit 39cd4f3 into intel:sycl Sep 6, 2024
12 checks passed
callumfare added a commit to callumfare/llvm that referenced this pull request Sep 6, 2024
sarnex pushed a commit that referenced this pull request Sep 6, 2024
#15262 failed post-commit testing, this should fix it
@sarnex
Copy link
Contributor

sarnex commented Sep 6, 2024

@callumfare I think this PR also caused this error too:

 In file included from /__w/llvm/llvm/build/_deps/unified-runtime-src/source/adapters/opencl/adapter.cpp:11:
/__w/llvm/llvm/build/_deps/unified-runtime-src/source/adapters/opencl/common.hpp:317:29: error: 'cl_mutable_base_config_khr' does not name a type; did you mean 'cl_mutable_dispatch_config_khr'?
  317 |                       const cl_mutable_base_config_khr *mutable_config);

That's from my PR here and we also see it in Mac postcommit here.

Any ideas? Should we revert this? Build is broken for everyone using HEAD.

@callumfare
Copy link
Contributor Author

@sarnex That's unrelated to this change. It appears to be caused by this change to OpenCL-Headers - KhronosGroup/OpenCL-Headers@7258b9e - which just merged. We must be pulling the tip of the repo rather than a specific commit

@sarnex
Copy link
Contributor

sarnex commented Sep 6, 2024

@callumfare Nice find, I didn't understand how this PR could cause the problem either. I'll try to fix it.

FYI @EwanC

@sarnex
Copy link
Contributor

sarnex commented Sep 6, 2024

This PR should fix it but then I have to bump the version here.

oneapi-src/unified-runtime#2067

steffenlarsen pushed a commit that referenced this pull request Oct 7, 2024
These e2e tests started failing after PI removal and replacing it with
UR ([PR](#14145)) However, they were
not related to it.

On Windows, dlls unloading is inconsistent and if we try to release
these UR objects manually, inconsistent hangs happen due to a race
between unloading the UR adapters dlls (in addition to their dependency
dlls) and the releasing of these UR objects (The proxy loader have
solved this problem partially
[here](#15262)). So, we currently
shutdown without releasing them and windows should handle the memory
cleanup.

This behaviour is the same old behaviour as before removing PI as on
Investigations on this. This was only hidden before PI removal as it was
calling the PI entry-point but doesn't make it to UR entry-point and
Filecheck logs check for objects release would pass as it only check for
the call to the PI entry-point without check that the call was a
successful call. That was the PI call for `piContextRelease` before PI
removal:

```
---> piContextRelease(
        <unknown> : 0000023CC0EBA6C0
) ---> API Called After Plugin Teardown, Functon Call ignored.
```

Fixes #14768
Fixes #14950
Fixes #14968
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.