Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure UR will clear contexts before unloading on windows #2024

Conversation

omarahmed1111
Copy link
Contributor

@omarahmed1111 omarahmed1111 commented Aug 28, 2024

UR adapter is unloaded before clearing all intel/llvm side objects, this is because sycl-rt will only start the destruction on a call to destruct sycl.dll on DllMain here. windows unload the dependencies first before calling DllMain process unloading. So, UR adapter dll will be unloaded before even sycl DllMain destruction is called. So, if we tried at sycl-rt side to clean the default context, this will result in access violation problems.

The proposed solution here is to use DllMain in UR L0 adapter, cache any created contexts/kernels and let it clean any context/kernels left before it unloads. I choosed this context object as it holds the usm pools so it is important to clean that, and kernels needed to fix some tests so also need to be cleaned.

@omarahmed1111 omarahmed1111 force-pushed the small-fix-on-windows-ur_sink-issue branch from dced7c2 to bc14e90 Compare August 28, 2024 21:39
@github-actions github-actions bot added the level-zero L0 adapter specific issues label Aug 28, 2024
@omarahmed1111 omarahmed1111 force-pushed the small-fix-on-windows-ur_sink-issue branch from bc14e90 to 39ffb59 Compare August 29, 2024 15:21
@omarahmed1111 omarahmed1111 changed the title Delete ur context after l0 context Ensure UR will clear contexts before unloading on windows Aug 29, 2024
@omarahmed1111 omarahmed1111 force-pushed the small-fix-on-windows-ur_sink-issue branch 5 times, most recently from df47657 to 70d5e20 Compare August 30, 2024 15:47
@omarahmed1111 omarahmed1111 force-pushed the small-fix-on-windows-ur_sink-issue branch from 70d5e20 to 569d0ca Compare September 1, 2024 15:13
@omarahmed1111 omarahmed1111 force-pushed the small-fix-on-windows-ur_sink-issue branch from 569d0ca to 38450c4 Compare September 2, 2024 12:20
Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler workaround would be for the kernel object to hold a reference to the adapter. This way, the adapter won't delete state until the kernel object is deleted.
If the problem is that the kernel object is used after UR is destroyed with urTeardown, then this is unsolvable in the general case in UR, unless we want to hold a virtual reference to the UR with every object handle. Which is a potential solution, but a fairly expensive one in my opinion, since every object access is likely to go through a shared atomic counter (which is a false-sharing problem).

I wouldn't want us to now be making arduous workarounds like this one for application-level problems. Can't this be solved more easily in SYCL or the application itself?

@omarahmed1111
Copy link
Contributor Author

A simpler workaround would be for the kernel object to hold a reference to the adapter. This way, the adapter won't delete state until the kernel object is deleted. If the problem is that the kernel object is used after UR is destroyed with urTeardown, then this is unsolvable in the general case in UR, unless we want to hold a virtual reference to the UR with every object handle. Which is a potential solution, but a fairly expensive one in my opinion, since every object access is likely to go through a shared atomic counter (which is a false-sharing problem).

I wouldn't want us to now be making arduous workarounds like this one for application-level problems. Can't this be solved more easily in SYCL or the application itself?

The problem is purely that UR adapter is unloaded before clearing all intel/llvm side objects, this is because sycl-rt will only start the destruction on a call to destruct sycl.dll on DllMain here. windows unload the dependencies first before calling DllMain process unloading. So, UR adapter dll will be unloaded before even sycl DllMain destruction is called. So, if we tried at sycl-rt side to clean the default context, this will result in access violation problems. This ofc doesn't stop at sycl-rt but also UR dependencies is still same problem. umf for example had the same problem in UR but luckily umf is using urInit/urTearDown and uses a refcount and only unloads when it reaches 0. That is what have been fixed in this PR: #2007 . The problem was not consistent as it depends how fast windows have unloaded UR to how much it would call the default context destruction.

NOTE: the default context destruction is not currently happening on windows and it is let to be a memory leak. Destructing it manually is what leads to this.

The proposed solution here is to use DllMain in UR L0 adapter, cache any created contexts/kernels and let it clean any context/kernels left before it unloads. I choosed this context object as it holds the usm pools so it is important to clean that, and kernels needed to fix some tests so also need to be cleaned.

a perfect solution here would be that applications would control that it won't unload the dlls like UR before it clean all UR objects. but didn't find a way to do that on windows at the current moment thats why I made this workaround as a temporary solution for this.

@omarahmed1111
Copy link
Contributor Author

A simpler workaround would be for the kernel object to hold a reference to the adapter. This way, the adapter won't delete state until the kernel object is deleted.

I had to ensure urKernelRelease is called so tests could track it as normal.

@omarahmed1111 omarahmed1111 marked this pull request as ready for review September 2, 2024 16:59
@omarahmed1111 omarahmed1111 requested a review from a team as a code owner September 2, 2024 16:59
@pbalcer
Copy link
Contributor

pbalcer commented Sep 3, 2024

Is this a regression after PI removal? If so, how did this work before?

The problem is purely that UR adapter is unloaded before clearing all intel/llvm side objects, this is because sycl-rt will only start the destruction on a call to destruct sycl.dll on DllMain here.

From what I can tell, the flow here causes urLoaderTearDown to be called. This in turn will delete loader's context, which unloads the adapter libraries. Is something accessing UR handles after urLoaderTearDown is called?

Or you mean that urAdapterRelease is called, the adapter state is destroyed (but the library stays loaded), and this is when the invalid access is happening?

What I'm concerned about with your patch is that it is a specialized workaround for one specific problem. This would be a whack-a-mole game of creating similar workarounds for other functions when someone submits a bug for their app that just happens to trigger a similar problem but for a different adapter/function.

@omarahmed1111
Copy link
Contributor Author

From what I can tell, the flow here causes urLoaderTearDown to be called. This in turn will delete loader's context, which unloads the adapter libraries. Is something accessing UR handles after urLoaderTearDown is called?

Or you mean that urAdapterRelease is called, the adapter state is destroyed (but the library stays loaded), and this is when the invalid access is happening?

SYCL-RT starts the destruction flow on dllMain->PROCESS-DETACH call which will call urLoaderTeardown but windows does call them in this order on destruction:

  1. unloading the dll of adapters like ur_level_Zero_adapter.dll
  2. unloading the dll of the UR loader.
  3. Calling destructors in sycl-rt
  4. unloading the dll of sycl-rt -> calling dllMain->PROCESS-DETACH in global handler. which will call urLoaderTearDown at this point.

when we reach the forth point, the adapter is already unloaded so calling any UR function will give access violation as it is calling at this point a function pointer that is removed. but that is also not happening momentarily, as unloading the dll is taking time so it depends how fast the adapter dll is unloaded, this access violation happens. So at the point of calling urLoaderTearDown the adapter was already unloaded and calling ur entries at this point, either it returns UR_RESULT_ERROR_UNITIALIZED or it gives access violation, depending on the state of unloading the adapter dll is at another thread.

@omarahmed1111
Copy link
Contributor Author

Is this a regression after PI removal? If so, how did this work before?

I think this problem was one of two scenarios :

  1. sycl-rt was leaking memory by not clearing the default context on windows, and other tests that was destructing some objects at the end didn't go noticing this problem as they were checking for PI calls which was happening but the UR entries might have still returned unitialized.
  2. The proxy loader was helping in keeping the UR loader a little to destruct the objects.

@omarahmed1111 omarahmed1111 marked this pull request as draft September 3, 2024 11:04
@omarahmed1111
Copy link
Contributor Author

This was replaced by a more proper fix here: intel/llvm#15262

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level-zero L0 adapter specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants