-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with reductions on GPU (not CPU). #14
Comments
Hi D-Dirk, thanks for submitting the issue. will take a look today and see what I learn, and get back to you. |
This compile error is also seen when compiling for nvptx as well. |
I have not made very much progress on this one yet, I will take another look tomorrow. |
This also fails to compile with latest build of clang/llvm /work/llvm-work1/bldNinja-llvm/bin/clang++ d-dirk-14.cpp -c -fopenmp -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_70 clang-8: /work/llvm-work1/llvm/tools/clang/lib/CodeGen/CodeGenFunction.cpp:1814: CodeGenFunction::VlaSizePair clang::CodeGen::CodeGenFunction::getVLASize(const clang::VariableArrayType *): Assertion `vlaSize && "no size for VLA!"' failed. |
we are releasing aomp_0.6-5 today or monday, however we are still working on this defect. |
a small update: The error message looks like this: (or will in our upcoming 0.7 release) So when we change the test caes to use 'const' it then compiles however, the executable encounters a runtime error that needs looking into Memory access fault by GPU node-1 (Agent handle: 0x1471d60) on address 0x148f000. Reason: Page not present or supervisor privilege. |
it would seem there is some sort of MAPing issue on the A_reduction array. +double A_reduction[nos]; |
We fixed the compiler in 0.7.x to generate a warning that variable length arrays are not supported in a target region. |
Dear Community,
the following reduction from OpenMP 4.5 works well on Haswell but crashes on Vega20 during compilation. I attached a minimal reproducible example below (no meaningful data required for replication). Is this type of reduction supposed to work already in aomp 0.63? If not, which alternative would currently work for this example without slowing the computation down too much? Unfortunately, I can neither move the reduction clause nor the team clause to lower levels of the loops without creating either a data race, or slowing the computation down considerably. Any help would be greatly appreciated!
#include <stdio.h>
#include <iostream>
int main () {
int no = 25;
int nc = 1000000;
int nv = 30;
int nos = (no-1)*(no-1);
double *A_reduction = new double[nos]();
double *PA = new double[nv*nos]();
double *CA = new double[nc*nv]();
#pragma omp target data map(to:CA[0:(nc*nv)],PA[0:(nv*nos)]) map(A_reduction[0:nos])
#pragma omp target teams distribute parallel for reduction(+:A_reduction[:nos]) collapse(2)
//#pragma omp parallel for reduction(+:A_reduction[:nos]) collapse(2) //works!
for (int op = 0; op < no-1; ++op) {
for (int of = 0; of < no-1; ++of) {
for (int c=0; c < nc; ++c) {
double pc = 1;
for (int v = 0; v < nv; ++v) {
pc *= (CA[v+c*nv]<0)+CA[v+c*nv]*PA[v+of*nv+op*(no-1)*nv];}
A_reduction[of+op*(no-1)] += pc;}}}}
I used the following compilation flags for the GPU and the CPU versions:
/opt/rocm/aomp/bin/clang++ -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 reductionproblem.cpp
Thank you very much for your attention!
The text was updated successfully, but these errors were encountered: