Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When should we use __FORCE_MKL_FLUSH__ in the C interface? #401

Closed
amontoison opened this issue Mar 20, 2024 · 5 comments · Fixed by #403
Closed

When should we use __FORCE_MKL_FLUSH__ in the C interface? #401

amontoison opened this issue Mar 20, 2024 · 5 comments · Fixed by #403

Comments

@amontoison
Copy link
Member

amontoison commented Mar 20, 2024

I don't understand why I have a segementation fault when I call some C functions that contains __FORCE_MKL_FLUSH__:
https://github.com/JuliaGPU/oneAPI.jl/blob/master/deps/src/onemkl.cpp#L11-L12

I don't have anymore a segmentation fault when I remove __FORCE_MKL_FLUSH__ but it only concerns a few routines (geqrf -- LAPACK and set_csr_data -- SPARSE).
Why don't we have the same behaviour with all routines?
I use __FORCE_MKL_FLUSH__ after the routines that return void.

[5119] signal (11.1): Erreur de segmentation
in expression starting at REPL[7]:1
_ZNK4sycl3_V15event11get_backendEv at /home/alexis/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V110get_nativeILNS0_7backendE2ENS0_5eventEEENS0_14backend_traitsIXT_EE11return_typeIT0_EERKS7_ at /home/alexis/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/deps/lib/liboneapi_support.so (unknown line)
onemklSgeqrf at /home/alexis/.julia/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/deps/lib/liboneapi_support.so (unknown line)
onemklSgeqrf at /home/alexis/Bureau/git/oneAPI.jl/lib/support/liboneapi_support.jl:2473
unknown function (ip: 0x7fb6b8136699)
@pengtu
Copy link
Contributor

pengtu commented Mar 26, 2024

The FORCE_MKL_FLUSH is used to make sure that the MKL task submitted to the SYCL queue has been dispatched. The SYCL runtime can temporarily hold a SYCL kernel without submitting it to the GPU driver (L0 driver in our case). The oneAPI.jl runtime works directly on L0 queue to synchronize between MKL SYCL function call and Julia statements. If a MKL kernel was held by the SYCL runtime and oneAPI.jl runtime calls zeQueueSynchronize() to wait for the MKL kernel to finish, they will be out of order. Hence, we call FORCE_MKL_FLUSH to make sure the SYCL kernel has been submitted to the L0 queue.

The FORCE_MKL_FLUSH(cmd) calls sycl::get_native<sycl::backend::ext_oneapi_level_zero(cmd) supposes to take a SYCL event returned by the MKL function as 'cmd'. If the MKL function doesn't return an event, it segfaults.

@amontoison
Copy link
Member Author

amontoison commented Mar 26, 2024

Thanks @pengtu!
The issue is how to be sure that the "usm" version and not the "buffer" version of a routine is used in the C interface?
For example with geqrf here, we have the same parameters if we don't provide the argument events: documentation of geqrf.

Should we provide an empty list {} as a last parameter to the MKL routines to be sure that the usm version is used and we can call FORCE_MKL_FLUSH?

@pengtu
Copy link
Contributor

pengtu commented Mar 26, 2024

@amontoison: Indeed that the C wrapper might have been invoking the "buffer" version. Please try passing an empty list {} as the last argument to be sure that the 'usm' version is invoked.

@amontoison
Copy link
Member Author

amontoison commented Mar 27, 2024

@pengtu Should we only wrap the "usm" version if both version are available?

@pengtu
Copy link
Contributor

pengtu commented Apr 10, 2024

@pengtu Should we only wrap the "usm" version if both version are available?

Yes, we shall always call the "usm" version of the oneMKL since Julia directly allocate the device array without using SYCL buffer interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants