Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with reductions on GPU (not CPU). #14

Closed
D-Dirk opened this issue Jun 12, 2019 · 8 comments
Closed

Problems with reductions on GPU (not CPU). #14

D-Dirk opened this issue Jun 12, 2019 · 8 comments

Comments

@D-Dirk
Copy link

D-Dirk commented Jun 12, 2019

Dear Community,

the following reduction from OpenMP 4.5 works well on Haswell but crashes on Vega20 during compilation. I attached a minimal reproducible example below (no meaningful data required for replication). Is this type of reduction supposed to work already in aomp 0.63? If not, which alternative would currently work for this example without slowing the computation down too much? Unfortunately, I can neither move the reduction clause nor the team clause to lower levels of the loops without creating either a data race, or slowing the computation down considerably. Any help would be greatly appreciated!

#include <stdio.h>
#include <iostream>

int main () {
int no = 25;
int nc = 1000000;
int nv = 30;
int nos = (no-1)*(no-1);
double *A_reduction = new double[nos]();
double *PA = new double[nv*nos]();
double *CA = new double[nc*nv]();

#pragma omp target data map(to:CA[0:(nc*nv)],PA[0:(nv*nos)]) map(A_reduction[0:nos])
#pragma omp target teams distribute parallel for reduction(+:A_reduction[:nos]) collapse(2)
//#pragma omp parallel for reduction(+:A_reduction[:nos]) collapse(2) //works!
for (int op = 0; op < no-1; ++op) {
for (int of = 0; of < no-1; ++of) {
for (int c=0; c < nc; ++c) {
double pc = 1;
for (int v = 0; v < nv; ++v) {
pc *= (CA[v+c*nv]<0)+CA[v+c*nv]*PA[v+of*nv+op*(no-1)*nv];}
A_reduction[of+op*(no-1)] += pc;}}}}

I used the following compilation flags for the GPU and the CPU versions:

/opt/rocm/aomp/bin/clang++ -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 reductionproblem.cpp

Thank you very much for your attention!

@ronlieb
Copy link
Contributor

ronlieb commented Jun 12, 2019

Hi D-Dirk, thanks for submitting the issue. will take a look today and see what I learn, and get back to you.

@estewart08
Copy link
Contributor

estewart08 commented Jun 12, 2019

This compile error is also seen when compiling for nvptx as well.
clang-8: /aomp/clang/lib/CodeGen/CGRecordLayoutBuilder.cpp:595: void {anonymous}::CGRecordLowering::clipTailPadding(): Assertion `Prior->Kind == MemberInfo::Field && !Prior->FD && "Only storage fields have tail padding!"' failed.

@ronlieb
Copy link
Contributor

ronlieb commented Jun 13, 2019

I have not made very much progress on this one yet, I will take another look tomorrow.

@ronlieb
Copy link
Contributor

ronlieb commented Jun 29, 2019

This also fails to compile with latest build of clang/llvm

/work/llvm-work1/bldNinja-llvm/bin/clang++ d-dirk-14.cpp -c -fopenmp -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_70

clang-8: /work/llvm-work1/llvm/tools/clang/lib/CodeGen/CodeGenFunction.cpp:1814: CodeGenFunction::VlaSizePair clang::CodeGen::CodeGenFunction::getVLASize(const clang::VariableArrayType *): Assertion `vlaSize && "no size for VLA!"' failed.

@ronlieb
Copy link
Contributor

ronlieb commented Jun 29, 2019

we are releasing aomp_0.6-5 today or monday, however we are still working on this defect.
​so this is not yet resolved. sorry for the delay.

@ronlieb
Copy link
Contributor

ronlieb commented Jul 12, 2019

a small update:
we reported your issue to the OpenMP clang community as the latest version of LLVM also produces a similar error. The engineer (Alexy) upstream added an error message to detect the situation where VLA's are attempted to be MAPed into the target region.

The error message looks like this: (or will in our upcoming 0.7 release)
d-dirk-14.cpp:12:62: error: cannot generate code for reduction on array section, which requires a variable length array
#pragma omp target teams distribute parallel for reduction(+:A_reduction[:nos]) collapse(2)
^
d-dirk-14.cpp:12:62: note: variable length arrays are not supported for the current target

So when we change the test caes to use 'const'
-int no = 25;
-int nc = 1000000;
-int nv = 30;
-int nos = (no-1)(no-1);
+const int no = 25;
+const int nc = 1000000;
+const int nv = 30;
+const int nos = (no-1)
(no-1);

it then compiles however, the executable encounters a runtime error that needs looking into

Memory access fault by GPU node-1 (Agent handle: 0x1471d60) on address 0x148f000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

@ronlieb
Copy link
Contributor

ronlieb commented Jul 12, 2019

it would seem there is some sort of MAPing issue on the A_reduction array.
when I change it from an allocator to a local declaration, the abort goes away.

+double A_reduction[nos];
+//double *A_reduction = new doublenos;

@gregrodgers
Copy link
Contributor

We fixed the compiler in 0.7.x to generate a warning that variable length arrays are not supported in a target region.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants