-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tool hangs with Fortran offloaded to the GPU with XL #76
Comments
After thinking about this a bit more I have decided to delete my previous comment, as it could be totally wrong and misleading. |
I believe you were right. It seems the issue is related to the channel functionality and there is a deadlock. The reason I said this is that I removed the two channels (host and device) and the related code from the tool and the hangs disappears. Also, this is the only tool that hangs. Question: how can I get line information (like the one provided by the I see the injected function for this tool is the following – is there a way to pass debug info (perhaps a string) or to get it from this device function (perhaps using the optcode)?
|
What is a bit surprising is that the issue seems related on which compiler is used on the target application and not to target the nvbit tool compilation. Initially I thought you were using xlf90 to compile the nvbit tool and I assumed that caused a problem in the channel, but if xlf90 is used on the application that should not create any problem. Maybe the way xlf90 initializes the GPU when using OpenMP could interfere with the channel code, but it is all wild guessing at this point since I have never seen this problem before. I will try to reproduce on this side, but chances are small. Regarding passing something to the injection function, you can pass immediates or pointers (casted as uint64_t like in the pchannel_dev above). So if you want to pass a string I would suggest you allocate the string in GPU device memory during the instrumentation phase and pass a pointer to the instrumentation function. |
One of the provided tools (
record_reg_vals
) hangs with FORTRAN code offloaded in the GPU via OpenMP with the XL compiler. Interestingly the behavior is correct when the code is compiled with nvfortran. This is an IBM PPC platform with V100 GPUs.I suppose that this is a problem with XL generating code incompatible with NVBit and I'd like to report it to IBM, but I would appreciate your help in digging more into the issue.
Below is a simple FORTRAN program with an OpenMP parallel loop that is offloaded in the GPU. When I compile with
nvfortran
and run therecord_reg_vals
tool it works correctly. When I use the XL compiler, the program hangs.The program is:
This is how it is compiled:
I profiled it with nvcc to make sure the kernel is executed:
This is the output of the tool:
If it helps, I noticed that the
void nvbit_at_ctx_init(CUcontext ctx)
function is called at least twice (I put a printf statement to confirm) with XL, but it's called only once with nvfortran.As I said, when I compile with nvfortran the program terminates correctly and I can use the tool:
Here are my system specs:
System: ppc64le GNU/Linux
Thank you!!
The text was updated successfully, but these errors were encountered: