Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Delayed privatization. #79862

Closed
wants to merge 2 commits into from

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Jan 29, 2024

This is a PoC for delayed privatization in OpenMP. Instead of directly
emitting privatization code in the frontend, we add a new op to outline
the privatization logic for a symbol and call-like mapping that maps
from the host symbol to an outlined function-like privatizer op.

Later, we would inline the delayed privatizer function-like op in the
OpenMP region to basically get the same code generated directly by the
fronend at the moment.

Note: This is still a work-in-progress but giving comments on the taken approach or any advice is more than welcome :).

This is my plan to productize this WIP (✅ means the item is already done, ⌛ means it is in progress):

  1. In this PR: add a feature flag to control the usage of delayed privatization. This will be disabled by default and enabled gradually as we add support for delayed privatization for more cases. ✅
  2. In this PR: handle the remaining comments so that you do not have to spend effort on them in later separate PRs.
  3. This PR will not be merged since it is getting quite large. Instead I will split it into a number of PRs as explained in following points.
  4. PR 1: a NFC PR to copy the refactoring of genOpWithBody introduced here. This will introduce the genRegionEntryCB callback and use it for worksharing loop ops. ([flang][OpenMP][NFC] Outline genOpWithBody & createBodyOfOp args #80817 [flang][OpenMP][NFC] Further refactoring for genOpWithBody & #80839) ✅
  5. PR 2: copy the omp.private op along with a roundtripping lit test to test parsing, printing, and verification. ([MLIR][OpenMP] Add omp.private op #80955) ✅
  6. PR 3: copy the extensions in omp.parallel op (i.e. adding the private clause) along with a roundtripping lit test as well. ( [MLIR][OpenMP] Add private clause to omp.parallel #81452) ✅
  7. PR 4: copy the changes introduced in the OpenMPToLLVM.cpp conversion pattern: RegionOpConversion. ([MLIR][OpenMP] Support llvm conversion for omp.private regions #81414) ✅
  8. PR 5: copy the changes introduced in OpenMPToLLVMIRTranslation.cpp: the PrivCB and prepareOmpParallel. ( [MLIR][OpenMP] Support basic materialization for omp.private ops #81715) ✅
  9. PR 6: copy the changes needed for creating the omp.private ops and modifying the creation of the omp.parallel op to use the new clause. These are most of the changes in DataSharingProcessor and genParallelOp. ( [flang][OpenMP][MLIR] Basic support for delayed privatization code-gen #81833) ✅
  10. PRs 7..N: Add delayed privatization support for more symbol types and more OMP ops (e.g. target). ([MLIR][OpenMP] Extend omp.private materialization support: firstprivate #83761 [flang][OpenMP] Only use HLFIR base in privatization logic #84123 [flang][OpenMP] Add %flang_fc1 RUN to delayed privatization tests #84296 [flang][MLIR][OpenMP] Extend delayed privatization for scalar allocatables and pointers #84740 [flang][MLIR][OpenMP] Extend delayed privatization for arrays and characters #85023) ⌛
  11. PR N+1: Remove the delayed privatization feature flag.

Please let me know if you disagree with anything.

@ergawy ergawy marked this pull request as draft January 29, 2024 16:55
@llvmbot llvmbot added mlir flang Flang issues not falling into any other category mlir:func mlir:openmp flang:fir-hlfir flang:openmp labels Jan 29, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 29, 2024

@llvm/pr-subscribers-flang-fir-hlfir
@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-func

Author: Kareem Ergawy (ergawy)

Changes

This is a PoC for delayed privatization in OpenMP. Instead of directly emitting privatization code in the frontend, we add a new op to outline the privatization logic for a symbol and call-like mapping that maps from the host symbol to a block argument in the OpenMP region.

Example:

!$omp target private(x)
!$end omp target

Would be code-generated by flang as:

  func.func @<!-- -->foo() {
    omp.target x.privatizer %x -&gt; %argx: !fir.ref&lt;i32&gt; {
    bb0(%argx: !fir.ref&lt;i32&gt;):
      // ... use %argx ....
    }
  }

  "omp.private"() &lt;{function_type = (!fir.ref&lt;i32&gt;) -&gt; !fir.ref&lt;i32&gt;, sym_name = "x.privatizer"}&gt; ({
  ^bb0(%arg0: !fir.ref&lt;i32&gt;):
    %0 = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFprivate_clause_allocatableEx"}
    %1 = fir.load %arg0 : !fir.ref&lt;i32&gt;
    fir.store %1 to %0 : !fir.ref&lt;i32&gt;
    omp.yield(%0 : !fir.ref&lt;i32&gt;)
  }) : () -&gt; ()

Later, we would inline the delayed privatizer function-like op in the OpenMP region to basically get the same code generated directly by the fronend at the moment.

So far this PoC implements the following:

  • Adds the delayed privatization op: omp.private.
  • For simple symbols, emits the op.

Still TODO:

  • Extend the omp.target op to somehow model the oulined privatization logic.
  • Inline the outlined privatizer before emitting LLVM IR.
  • Support more complex symbols like allocatables.

Patch is 22.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79862.diff

8 Files Affected:

  • (modified) flang/include/flang/Lower/AbstractConverter.h (+13-5)
  • (modified) flang/include/flang/Lower/SymbolMap.h (+3)
  • (modified) flang/lib/Lower/Bridge.cpp (+32-22)
  • (modified) flang/lib/Lower/OpenMP.cpp (+67-15)
  • (added) flang/test/Lower/OpenMP/FIR/delayed_privatization.f90 (+42)
  • (modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+34-1)
  • (modified) mlir/lib/Dialect/Func/IR/FuncOps.cpp (+1-3)
  • (modified) mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp (+13)
diff --git a/flang/include/flang/Lower/AbstractConverter.h b/flang/include/flang/Lower/AbstractConverter.h
index c19dcbdcdb39022..b3ca804256ee093 100644
--- a/flang/include/flang/Lower/AbstractConverter.h
+++ b/flang/include/flang/Lower/AbstractConverter.h
@@ -16,6 +16,7 @@
 #include "flang/Common/Fortran.h"
 #include "flang/Lower/LoweringOptions.h"
 #include "flang/Lower/PFTDefs.h"
+#include "flang/Lower/SymbolMap.h"
 #include "flang/Optimizer/Builder/BoxValue.h"
 #include "flang/Semantics/symbol.h"
 #include "mlir/IR/Builders.h"
@@ -92,7 +93,8 @@ class AbstractConverter {
 
   /// Binds the symbol to an fir extended value. The symbol binding will be
   /// added or replaced at the inner-most level of the local symbol map.
-  virtual void bindSymbol(SymbolRef sym, const fir::ExtendedValue &exval) = 0;
+  virtual void bindSymbol(SymbolRef sym, const fir::ExtendedValue &exval,
+                          Fortran::lower::SymMap *symMap = nullptr) = 0;
 
   /// Override lowering of expression with pre-lowered values.
   /// Associate mlir::Value to evaluate::Expr. All subsequent call to
@@ -111,14 +113,16 @@ class AbstractConverter {
   /// For a given symbol which is host-associated, create a clone using
   /// parameters from the host-associated symbol.
   virtual bool
-  createHostAssociateVarClone(const Fortran::semantics::Symbol &sym) = 0;
+  createHostAssociateVarClone(const Fortran::semantics::Symbol &sym,
+                              Fortran::lower::SymMap *symMap = nullptr) = 0;
 
   virtual void
   createHostAssociateVarCloneDealloc(const Fortran::semantics::Symbol &sym) = 0;
 
-  virtual void copyHostAssociateVar(
-      const Fortran::semantics::Symbol &sym,
-      mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr) = 0;
+  virtual void
+  copyHostAssociateVar(const Fortran::semantics::Symbol &sym,
+                       mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr,
+                       Fortran::lower::SymMap *symMap = nullptr) = 0;
 
   /// For a given symbol, check if it is present in the inner-most
   /// level of the symbol map.
@@ -295,6 +299,10 @@ class AbstractConverter {
     return loweringOptions;
   }
 
+  virtual Fortran::lower::SymbolBox
+  lookupOneLevelUpSymbol(const Fortran::semantics::Symbol &sym,
+                         Fortran::lower::SymMap *symMap = nullptr) = 0;
+
 private:
   /// Options controlling lowering behavior.
   const Fortran::lower::LoweringOptions &loweringOptions;
diff --git a/flang/include/flang/Lower/SymbolMap.h b/flang/include/flang/Lower/SymbolMap.h
index a55e4b133fe0a8d..834ab747e1a4ad9 100644
--- a/flang/include/flang/Lower/SymbolMap.h
+++ b/flang/include/flang/Lower/SymbolMap.h
@@ -101,6 +101,9 @@ struct SymbolBox : public fir::details::matcher<SymbolBox> {
                  [](const fir::FortranVariableOpInterface &x) {
                    return fir::FortranVariableOpInterface(x).getBase();
                  },
+                 [](const fir::MutableBoxValue &x) {
+                   return x.getAddr();
+                 },
                  [](const auto &x) { return x.getAddr(); });
   }
 
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index b3aeb99fc5d57c5..38c9c76b6793fda 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -498,16 +498,18 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   /// Add the symbol binding to the inner-most level of the symbol map and
   /// return true if it is not already present. Otherwise, return false.
   bool bindIfNewSymbol(Fortran::lower::SymbolRef sym,
-                       const fir::ExtendedValue &exval) {
-    if (shallowLookupSymbol(sym))
+                       const fir::ExtendedValue &exval,
+                       Fortran::lower::SymMap *symMap = nullptr) {
+    if (shallowLookupSymbol(sym, symMap))
       return false;
-    bindSymbol(sym, exval);
+    bindSymbol(sym, exval, symMap);
     return true;
   }
 
   void bindSymbol(Fortran::lower::SymbolRef sym,
-                  const fir::ExtendedValue &exval) override final {
-    addSymbol(sym, exval, /*forced=*/true);
+                  const fir::ExtendedValue &exval,
+                  Fortran::lower::SymMap *symMap = nullptr) override final {
+    addSymbol(sym, exval, /*forced=*/true, symMap);
   }
 
   void
@@ -610,14 +612,15 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   }
 
   bool createHostAssociateVarClone(
-      const Fortran::semantics::Symbol &sym) override final {
+      const Fortran::semantics::Symbol &sym,
+      Fortran::lower::SymMap *symMap = nullptr) override final {
     mlir::Location loc = genLocation(sym.name());
     mlir::Type symType = genType(sym);
     const auto *details = sym.detailsIf<Fortran::semantics::HostAssocDetails>();
     assert(details && "No host-association found");
     const Fortran::semantics::Symbol &hsym = details->symbol();
     mlir::Type hSymType = genType(hsym);
-    Fortran::lower::SymbolBox hsb = lookupSymbol(hsym);
+    Fortran::lower::SymbolBox hsb = lookupSymbol(hsym, symMap);
 
     auto allocate = [&](llvm::ArrayRef<mlir::Value> shape,
                         llvm::ArrayRef<mlir::Value> typeParams) -> mlir::Value {
@@ -720,7 +723,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
           // Do nothing
         });
 
-    return bindIfNewSymbol(sym, exv);
+    return bindIfNewSymbol(sym, exv, symMap);
   }
 
   void createHostAssociateVarCloneDealloc(
@@ -745,16 +748,17 @@ class FirConverter : public Fortran::lower::AbstractConverter {
 
   void copyHostAssociateVar(
       const Fortran::semantics::Symbol &sym,
-      mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr) override final {
+      mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr,
+      Fortran::lower::SymMap *symMap = nullptr) override final {
     // 1) Fetch the original copy of the variable.
     assert(sym.has<Fortran::semantics::HostAssocDetails>() &&
            "No host-association found");
     const Fortran::semantics::Symbol &hsym = sym.GetUltimate();
-    Fortran::lower::SymbolBox hsb = lookupOneLevelUpSymbol(hsym);
+    Fortran::lower::SymbolBox hsb = lookupOneLevelUpSymbol(hsym, symMap);
     assert(hsb && "Host symbol box not found");
 
     // 2) Fetch the copied one that will mask the original.
-    Fortran::lower::SymbolBox sb = shallowLookupSymbol(sym);
+    Fortran::lower::SymbolBox sb = shallowLookupSymbol(sym, symMap);
     assert(sb && "Host-associated symbol box not found");
     assert(hsb.getAddr() != sb.getAddr() &&
            "Host and associated symbol boxes are the same");
@@ -763,8 +767,9 @@ class FirConverter : public Fortran::lower::AbstractConverter {
     mlir::OpBuilder::InsertPoint insPt = builder->saveInsertionPoint();
     if (copyAssignIP && copyAssignIP->isSet())
       builder->restoreInsertionPoint(*copyAssignIP);
-    else
+    else {
       builder->setInsertionPointAfter(sb.getAddr().getDefiningOp());
+    }
 
     Fortran::lower::SymbolBox *lhs_sb, *rhs_sb;
     if (copyAssignIP && copyAssignIP->isSet() &&
@@ -1060,8 +1065,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
 
   /// Find the symbol in the inner-most level of the local map or return null.
   Fortran::lower::SymbolBox
-  shallowLookupSymbol(const Fortran::semantics::Symbol &sym) {
-    if (Fortran::lower::SymbolBox v = localSymbols.shallowLookupSymbol(sym))
+  shallowLookupSymbol(const Fortran::semantics::Symbol &sym,
+                      Fortran::lower::SymMap *symMap = nullptr) {
+    auto &map = (symMap == nullptr ? localSymbols : *symMap);
+    if (Fortran::lower::SymbolBox v = map.shallowLookupSymbol(sym))
       return v;
     return {};
   }
@@ -1069,8 +1076,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   /// Find the symbol in one level up of symbol map such as for host-association
   /// in OpenMP code or return null.
   Fortran::lower::SymbolBox
-  lookupOneLevelUpSymbol(const Fortran::semantics::Symbol &sym) {
-    if (Fortran::lower::SymbolBox v = localSymbols.lookupOneLevelUpSymbol(sym))
+  lookupOneLevelUpSymbol(const Fortran::semantics::Symbol &sym,
+                         Fortran::lower::SymMap *symMap = nullptr) override {
+    auto &map = (symMap == nullptr ? localSymbols : *symMap);
+    if (Fortran::lower::SymbolBox v = map.lookupOneLevelUpSymbol(sym))
       return v;
     return {};
   }
@@ -1079,15 +1088,16 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   /// already in the map and \p forced is `false`, the map is not updated.
   /// Instead the value `false` is returned.
   bool addSymbol(const Fortran::semantics::SymbolRef sym,
-                 fir::ExtendedValue val, bool forced = false) {
-    if (!forced && lookupSymbol(sym))
+                 fir::ExtendedValue val, bool forced = false,
+                 Fortran::lower::SymMap *symMap = nullptr) {
+    auto &map = (symMap == nullptr ? localSymbols : *symMap);
+    if (!forced && lookupSymbol(sym, &map))
       return false;
     if (lowerToHighLevelFIR()) {
-      Fortran::lower::genDeclareSymbol(*this, localSymbols, sym, val,
-                                       fir::FortranVariableFlagsEnum::None,
-                                       forced);
+      Fortran::lower::genDeclareSymbol(
+          *this, map, sym, val, fir::FortranVariableFlagsEnum::None, forced);
     } else {
-      localSymbols.addSymbol(sym, val, forced);
+      map.addSymbol(sym, val, forced);
     }
     return true;
   }
diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index be2117efbabc0a0..edf0e6b2c0d90cd 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -169,12 +169,15 @@ class DataSharingProcessor {
   void collectSymbolsForPrivatization();
   void insertBarrier();
   void collectDefaultSymbols();
-  void privatize();
+  void
+  privatize(llvm::SetVector<mlir::omp::PrivateClauseOp> *privateInitializers);
   void defaultPrivatize();
   void copyLastPrivatize(mlir::Operation *op);
   void insertLastPrivateCompare(mlir::Operation *op);
-  void cloneSymbol(const Fortran::semantics::Symbol *sym);
-  void copyFirstPrivateSymbol(const Fortran::semantics::Symbol *sym);
+  void cloneSymbol(const Fortran::semantics::Symbol *sym,
+                   Fortran::lower::SymMap *symMap = nullptr);
+  void copyFirstPrivateSymbol(const Fortran::semantics::Symbol *sym,
+                              Fortran::lower::SymMap *symMap);
   void copyLastPrivateSymbol(const Fortran::semantics::Symbol *sym,
                              mlir::OpBuilder::InsertPoint *lastPrivIP);
   void insertDeallocs();
@@ -197,7 +200,8 @@ class DataSharingProcessor {
   // Step2 performs the copying for lastprivates and requires knowledge of the
   // MLIR operation to insert the last private update. Step2 adds
   // dealocation code as well.
-  void processStep1();
+  void processStep1(llvm::SetVector<mlir::omp::PrivateClauseOp>
+                        *privateInitializers = nullptr);
   void processStep2(mlir::Operation *op, bool isLoop);
 
   void setLoopIV(mlir::Value iv) {
@@ -206,10 +210,11 @@ class DataSharingProcessor {
   }
 };
 
-void DataSharingProcessor::processStep1() {
+void DataSharingProcessor::processStep1(
+    llvm::SetVector<mlir::omp::PrivateClauseOp> *privateInitializers) {
   collectSymbolsForPrivatization();
   collectDefaultSymbols();
-  privatize();
+  privatize(privateInitializers);
   defaultPrivatize();
   insertBarrier();
 }
@@ -239,20 +244,23 @@ void DataSharingProcessor::insertDeallocs() {
     }
 }
 
-void DataSharingProcessor::cloneSymbol(const Fortran::semantics::Symbol *sym) {
+void DataSharingProcessor::cloneSymbol(const Fortran::semantics::Symbol *sym,
+                                       Fortran::lower::SymMap *symMap) {
   // Privatization for symbols which are pre-determined (like loop index
   // variables) happen separately, for everything else privatize here.
   if (sym->test(Fortran::semantics::Symbol::Flag::OmpPreDetermined))
     return;
-  bool success = converter.createHostAssociateVarClone(*sym);
+  bool success = converter.createHostAssociateVarClone(*sym, symMap);
   (void)success;
   assert(success && "Privatization failed due to existing binding");
 }
 
 void DataSharingProcessor::copyFirstPrivateSymbol(
-    const Fortran::semantics::Symbol *sym) {
-  if (sym->test(Fortran::semantics::Symbol::Flag::OmpFirstPrivate))
-    converter.copyHostAssociateVar(*sym);
+    const Fortran::semantics::Symbol *sym,
+    Fortran::lower::SymMap *symMap = nullptr) {
+  if (sym->test(Fortran::semantics::Symbol::Flag::OmpFirstPrivate)) {
+    converter.copyHostAssociateVar(*sym, nullptr, symMap);
+  }
 }
 
 void DataSharingProcessor::copyLastPrivateSymbol(
@@ -487,8 +495,11 @@ void DataSharingProcessor::collectDefaultSymbols() {
   }
 }
 
-void DataSharingProcessor::privatize() {
+void DataSharingProcessor::privatize(
+    llvm::SetVector<mlir::omp::PrivateClauseOp> *privateInitializers) {
+
   for (const Fortran::semantics::Symbol *sym : privatizedSymbols) {
+
     if (const auto *commonDet =
             sym->detailsIf<Fortran::semantics::CommonBlockDetails>()) {
       for (const auto &mem : commonDet->objects()) {
@@ -496,6 +507,41 @@ void DataSharingProcessor::privatize() {
         copyFirstPrivateSymbol(&*mem);
       }
     } else {
+      if (privateInitializers != nullptr) {
+        auto ip = firOpBuilder.saveInsertionPoint();
+
+        auto moduleOp = firOpBuilder.getInsertionBlock()
+                            ->getParentOp()
+                            ->getParentOfType<mlir::ModuleOp>();
+
+        firOpBuilder.setInsertionPoint(&moduleOp.getBodyRegion().front(),
+                                       moduleOp.getBodyRegion().front().end());
+
+        Fortran::lower::SymbolBox hsb = converter.lookupOneLevelUpSymbol(*sym);
+        assert(hsb && "Host symbol box not found");
+
+        auto clauseOp = firOpBuilder.create<mlir::omp::PrivateClauseOp>(
+            hsb.getAddr().getLoc(), hsb.getAddr().getType(),
+            sym->name().ToString());
+        firOpBuilder.setInsertionPointToEnd(&clauseOp.getBody().front());
+
+        Fortran::semantics::Symbol cp = *sym;
+        Fortran::lower::SymMap map;
+        map.addSymbol(cp, clauseOp.getArgument(0));
+        map.pushScope();
+
+        cloneSymbol(&cp, &map);
+        copyFirstPrivateSymbol(&cp, &map);
+
+        firOpBuilder.create<mlir::omp::YieldOp>(
+            hsb.getAddr().getLoc(), map.shallowLookupSymbol(cp).getAddr());
+
+        firOpBuilder.restoreInsertionPoint(ip);
+      }
+
+      // TODO: This will eventually be an else to the `if` above it. For now, I
+      // emit both the outlined privatizer AND directly emitted cloning and
+      // copying ops while I am testing.
       cloneSymbol(sym);
       copyFirstPrivateSymbol(sym);
     }
@@ -2272,6 +2318,7 @@ static void createBodyOfOp(
     llvm::SmallVector<mlir::Type> tiv(args.size(), loopVarType);
     llvm::SmallVector<mlir::Location> locs(args.size(), loc);
     firOpBuilder.createBlock(&op.getRegion(), {}, tiv, locs);
+
     // The argument is not currently in memory, so make a temporary for the
     // argument, and store it there, then bind that location to the argument.
     mlir::Operation *storeOp = nullptr;
@@ -2291,10 +2338,11 @@ static void createBodyOfOp(
 
   // If it is an unstructured region and is not the outer region of a combined
   // construct, create empty blocks for all evaluations.
-  if (eval.lowerAsUnstructured() && !outerCombined)
+  if (eval.lowerAsUnstructured() && !outerCombined) {
     Fortran::lower::createEmptyRegionBlocks<mlir::omp::TerminatorOp,
                                             mlir::omp::YieldOp>(
         firOpBuilder, eval.getNestedEvaluations());
+  }
 
   // Start with privatization, so that the lowering of the nested
   // code will use the right symbols.
@@ -2307,12 +2355,14 @@ static void createBodyOfOp(
   if (privatize) {
     if (!dsp) {
       tempDsp.emplace(converter, *clauses, eval);
-      tempDsp->processStep1();
+      llvm::SetVector<mlir::omp::PrivateClauseOp> privateInitializers;
+      tempDsp->processStep1(&privateInitializers);
     }
   }
 
   if constexpr (std::is_same_v<Op, mlir::omp::ParallelOp>) {
     threadPrivatizeVars(converter, eval);
+
     if (clauses) {
       firOpBuilder.setInsertionPoint(marker);
       ClauseProcessor(converter, *clauses).processCopyin();
@@ -2361,6 +2411,7 @@ static void createBodyOfOp(
     if (exits.size() == 1)
       return exits[0];
     mlir::Block *exit = firOpBuilder.createBlock(&region);
+
     for (mlir::Block *b : exits) {
       firOpBuilder.setInsertionPointToEnd(b);
       firOpBuilder.create<mlir::cf::BranchOp>(loc, exit);
@@ -2382,8 +2433,9 @@ static void createBodyOfOp(
         assert(tempDsp.has_value());
         tempDsp->processStep2(op, isLoop);
       } else {
-        if (isLoop && args.size() > 0)
+        if (isLoop && args.size() > 0) {
           dsp->setLoopIV(converter.getSymbolAddress(*args[0]));
+        }
         dsp->processStep2(op, isLoop);
       }
     }
diff --git a/flang/test/Lower/OpenMP/FIR/delayed_privatization.f90 b/flang/test/Lower/OpenMP/FIR/delayed_privatization.f90
new file mode 100644
index 000000000000000..7b4d9135c6e070c
--- /dev/null
+++ b/flang/test/Lower/OpenMP/FIR/delayed_privatization.f90
@@ -0,0 +1,42 @@
+subroutine private_clause_allocatable()
+        integer :: xxx
+        integer :: yyy
+
+!$OMP PARALLEL FIRSTPRIVATE(xxx, yyy)
+!$OMP END PARALLEL
+
+end subroutine
+
+! This is what flang emits with the PoC:
+! --------------------------------------
+!
+!func.func @_QPprivate_clause_allocatable() {
+!  %0 = fir.alloca i32 {bindc_name = "xxx", uniq_name = "_QFprivate_clause_allocatableExxx"}
+!  %1 = fir.alloca i32 {bindc_name = "yyy", uniq_name = "_QFprivate_clause_allocatableEyyy"}
+!  omp.parallel {
+!    %2 = fir.alloca i32 {bindc_name = "xxx", pinned, uniq_name = "_QFprivate_clause_allocatableExxx"}
+!    %3 = fir.load %0 : !fir.ref<i32>
+!    fir.store %3 to %2 : !fir.ref<i32>
+!    %4 = fir.alloca i32 {bindc_name = "yyy", pinned, uniq_name = "_QFprivate_clause_allocatableEyyy"}
+!    %5 = fir.load %1 : !fir.ref<i32>
+!    fir.store %5 to %4 : !fir.ref<i32>
+!    omp.terminator
+!  }
+!  return
+!}
+!
+!"omp.private"() <{function_type = (!fir.ref<i32>) -> !fir.ref<i32>, sym_name = "xxx.privatizer"}> ({
+!^bb0(%arg0: !fir.ref<i32>):
+!  %0 = fir.alloca i32 {bindc_name = "xxx", pinned, uniq_name = "_QFprivate_clause_allocatableExxx"}
+!  %1 = fir.load %arg0 : !fir.ref<i32>
+!  fir.store %1 to %0 : !fir.ref<i32>
+!  omp.yield(%0 : !fir.ref<i32>)
+!}) : () -> ()
+!
+!"omp.private"() <{function_type = (!fir.ref<i32>) -> !fir.ref<i32>, sym_name = "yyy.privatizer"}> ({
+!^bb0(%arg0: !fir.ref<i32>):
+!  %0 = fir.alloca i32 {bindc_name = "yyy", pinned, uniq_name = "_QFprivate_clause_allocatableEyyy"}
+!  %1 = fir.load %arg0 : !fir.ref<i32>
+!  fir.store %1 to %0 : !fir.ref<i32>
+!  omp.yield(%0 : !fir.ref<i32>)
+!}) : () -> ()
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index 96c15e775a3024b..cc1bc97faaebd4d 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -16,6 +16,7 @@
 
 include "mlir/IR/EnumAttr.td"
 include "mlir/IR/OpBase.td"
+include "mlir/Interfaces/FunctionInterfaces.td"
 include "mlir/Interfaces/SideEffectInterfaces.td"
 include "mlir/Interfaces/ControlFlowInterfaces.td"
 include "mlir/IR/SymbolInterfaces.td"
@@ -621,7 +622,7 @@ def SimdLoopOp : OpenMP_Op<"simdloop", [AttrSizedOperandSegments,
 def YieldOp : OpenMP_Op<"yield",
     [Pure, ReturnLike, Terminator,
      ParentOneOf<["WsLoopOp", "ReductionDeclareOp",
-     "AtomicUpdateOp", "SimdLoopOp"]>]> {
+     "AtomicUpdateOp", "SimdLoopOp", "PrivateClauseOp"]>]> {
   let summary = "loop yield and termination operation";
   let description = [{
     "omp.yield" yields SSA values from the OpenMP dialect op region and
@@ -1478,6 +1479,38 @@ def Target_UpdateDataOp: OpenMP_Op<"target_update_data",
 //===----------------------------------------------------------------------===//
 // 2.14.5 target construct
 //===----------------------------------------------------------------------===//
+def PrivateClauseOp : OpenMP_Op<"private", [
+    IsolatedFromAbove, FunctionOpInterface
+  ]> {
+  let summary = "xxxx";
+ ...
[truncated]

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 29, 2024

@llvm/pr-subscribers-mlir-openmp

Author: Kareem Ergawy (ergawy)

Changes

This is a PoC for delayed privatization in OpenMP. Instead of directly emitting privatization code in the frontend, we add a new op to outline the privatization logic for a symbol and call-like mapping that maps from the host symbol to a block argument in the OpenMP region.

Example:

!$omp target private(x)
!$end omp target

Would be code-generated by flang as:

  func.func @<!-- -->foo() {
    omp.target x.privatizer %x -&gt; %argx: !fir.ref&lt;i32&gt; {
    bb0(%argx: !fir.ref&lt;i32&gt;):
      // ... use %argx ....
    }
  }

  "omp.private"() &lt;{function_type = (!fir.ref&lt;i32&gt;) -&gt; !fir.ref&lt;i32&gt;, sym_name = "x.privatizer"}&gt; ({
  ^bb0(%arg0: !fir.ref&lt;i32&gt;):
    %0 = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFprivate_clause_allocatableEx"}
    %1 = fir.load %arg0 : !fir.ref&lt;i32&gt;
    fir.store %1 to %0 : !fir.ref&lt;i32&gt;
    omp.yield(%0 : !fir.ref&lt;i32&gt;)
  }) : () -&gt; ()

Later, we would inline the delayed privatizer function-like op in the OpenMP region to basically get the same code generated directly by the fronend at the moment.

So far this PoC implements the following:

  • Adds the delayed privatization op: omp.private.
  • For simple symbols, emits the op.

Still TODO:

  • Extend the omp.target op to somehow model the oulined privatization logic.
  • Inline the outlined privatizer before emitting LLVM IR.
  • Support more complex symbols like allocatables.

Patch is 22.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79862.diff

8 Files Affected:

  • (modified) flang/include/flang/Lower/AbstractConverter.h (+13-5)
  • (modified) flang/include/flang/Lower/SymbolMap.h (+3)
  • (modified) flang/lib/Lower/Bridge.cpp (+32-22)
  • (modified) flang/lib/Lower/OpenMP.cpp (+67-15)
  • (added) flang/test/Lower/OpenMP/FIR/delayed_privatization.f90 (+42)
  • (modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+34-1)
  • (modified) mlir/lib/Dialect/Func/IR/FuncOps.cpp (+1-3)
  • (modified) mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp (+13)
diff --git a/flang/include/flang/Lower/AbstractConverter.h b/flang/include/flang/Lower/AbstractConverter.h
index c19dcbdcdb39022..b3ca804256ee093 100644
--- a/flang/include/flang/Lower/AbstractConverter.h
+++ b/flang/include/flang/Lower/AbstractConverter.h
@@ -16,6 +16,7 @@
 #include "flang/Common/Fortran.h"
 #include "flang/Lower/LoweringOptions.h"
 #include "flang/Lower/PFTDefs.h"
+#include "flang/Lower/SymbolMap.h"
 #include "flang/Optimizer/Builder/BoxValue.h"
 #include "flang/Semantics/symbol.h"
 #include "mlir/IR/Builders.h"
@@ -92,7 +93,8 @@ class AbstractConverter {
 
   /// Binds the symbol to an fir extended value. The symbol binding will be
   /// added or replaced at the inner-most level of the local symbol map.
-  virtual void bindSymbol(SymbolRef sym, const fir::ExtendedValue &exval) = 0;
+  virtual void bindSymbol(SymbolRef sym, const fir::ExtendedValue &exval,
+                          Fortran::lower::SymMap *symMap = nullptr) = 0;
 
   /// Override lowering of expression with pre-lowered values.
   /// Associate mlir::Value to evaluate::Expr. All subsequent call to
@@ -111,14 +113,16 @@ class AbstractConverter {
   /// For a given symbol which is host-associated, create a clone using
   /// parameters from the host-associated symbol.
   virtual bool
-  createHostAssociateVarClone(const Fortran::semantics::Symbol &sym) = 0;
+  createHostAssociateVarClone(const Fortran::semantics::Symbol &sym,
+                              Fortran::lower::SymMap *symMap = nullptr) = 0;
 
   virtual void
   createHostAssociateVarCloneDealloc(const Fortran::semantics::Symbol &sym) = 0;
 
-  virtual void copyHostAssociateVar(
-      const Fortran::semantics::Symbol &sym,
-      mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr) = 0;
+  virtual void
+  copyHostAssociateVar(const Fortran::semantics::Symbol &sym,
+                       mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr,
+                       Fortran::lower::SymMap *symMap = nullptr) = 0;
 
   /// For a given symbol, check if it is present in the inner-most
   /// level of the symbol map.
@@ -295,6 +299,10 @@ class AbstractConverter {
     return loweringOptions;
   }
 
+  virtual Fortran::lower::SymbolBox
+  lookupOneLevelUpSymbol(const Fortran::semantics::Symbol &sym,
+                         Fortran::lower::SymMap *symMap = nullptr) = 0;
+
 private:
   /// Options controlling lowering behavior.
   const Fortran::lower::LoweringOptions &loweringOptions;
diff --git a/flang/include/flang/Lower/SymbolMap.h b/flang/include/flang/Lower/SymbolMap.h
index a55e4b133fe0a8d..834ab747e1a4ad9 100644
--- a/flang/include/flang/Lower/SymbolMap.h
+++ b/flang/include/flang/Lower/SymbolMap.h
@@ -101,6 +101,9 @@ struct SymbolBox : public fir::details::matcher<SymbolBox> {
                  [](const fir::FortranVariableOpInterface &x) {
                    return fir::FortranVariableOpInterface(x).getBase();
                  },
+                 [](const fir::MutableBoxValue &x) {
+                   return x.getAddr();
+                 },
                  [](const auto &x) { return x.getAddr(); });
   }
 
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index b3aeb99fc5d57c5..38c9c76b6793fda 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -498,16 +498,18 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   /// Add the symbol binding to the inner-most level of the symbol map and
   /// return true if it is not already present. Otherwise, return false.
   bool bindIfNewSymbol(Fortran::lower::SymbolRef sym,
-                       const fir::ExtendedValue &exval) {
-    if (shallowLookupSymbol(sym))
+                       const fir::ExtendedValue &exval,
+                       Fortran::lower::SymMap *symMap = nullptr) {
+    if (shallowLookupSymbol(sym, symMap))
       return false;
-    bindSymbol(sym, exval);
+    bindSymbol(sym, exval, symMap);
     return true;
   }
 
   void bindSymbol(Fortran::lower::SymbolRef sym,
-                  const fir::ExtendedValue &exval) override final {
-    addSymbol(sym, exval, /*forced=*/true);
+                  const fir::ExtendedValue &exval,
+                  Fortran::lower::SymMap *symMap = nullptr) override final {
+    addSymbol(sym, exval, /*forced=*/true, symMap);
   }
 
   void
@@ -610,14 +612,15 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   }
 
   bool createHostAssociateVarClone(
-      const Fortran::semantics::Symbol &sym) override final {
+      const Fortran::semantics::Symbol &sym,
+      Fortran::lower::SymMap *symMap = nullptr) override final {
     mlir::Location loc = genLocation(sym.name());
     mlir::Type symType = genType(sym);
     const auto *details = sym.detailsIf<Fortran::semantics::HostAssocDetails>();
     assert(details && "No host-association found");
     const Fortran::semantics::Symbol &hsym = details->symbol();
     mlir::Type hSymType = genType(hsym);
-    Fortran::lower::SymbolBox hsb = lookupSymbol(hsym);
+    Fortran::lower::SymbolBox hsb = lookupSymbol(hsym, symMap);
 
     auto allocate = [&](llvm::ArrayRef<mlir::Value> shape,
                         llvm::ArrayRef<mlir::Value> typeParams) -> mlir::Value {
@@ -720,7 +723,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
           // Do nothing
         });
 
-    return bindIfNewSymbol(sym, exv);
+    return bindIfNewSymbol(sym, exv, symMap);
   }
 
   void createHostAssociateVarCloneDealloc(
@@ -745,16 +748,17 @@ class FirConverter : public Fortran::lower::AbstractConverter {
 
   void copyHostAssociateVar(
       const Fortran::semantics::Symbol &sym,
-      mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr) override final {
+      mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr,
+      Fortran::lower::SymMap *symMap = nullptr) override final {
     // 1) Fetch the original copy of the variable.
     assert(sym.has<Fortran::semantics::HostAssocDetails>() &&
            "No host-association found");
     const Fortran::semantics::Symbol &hsym = sym.GetUltimate();
-    Fortran::lower::SymbolBox hsb = lookupOneLevelUpSymbol(hsym);
+    Fortran::lower::SymbolBox hsb = lookupOneLevelUpSymbol(hsym, symMap);
     assert(hsb && "Host symbol box not found");
 
     // 2) Fetch the copied one that will mask the original.
-    Fortran::lower::SymbolBox sb = shallowLookupSymbol(sym);
+    Fortran::lower::SymbolBox sb = shallowLookupSymbol(sym, symMap);
     assert(sb && "Host-associated symbol box not found");
     assert(hsb.getAddr() != sb.getAddr() &&
            "Host and associated symbol boxes are the same");
@@ -763,8 +767,9 @@ class FirConverter : public Fortran::lower::AbstractConverter {
     mlir::OpBuilder::InsertPoint insPt = builder->saveInsertionPoint();
     if (copyAssignIP && copyAssignIP->isSet())
       builder->restoreInsertionPoint(*copyAssignIP);
-    else
+    else {
       builder->setInsertionPointAfter(sb.getAddr().getDefiningOp());
+    }
 
     Fortran::lower::SymbolBox *lhs_sb, *rhs_sb;
     if (copyAssignIP && copyAssignIP->isSet() &&
@@ -1060,8 +1065,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
 
   /// Find the symbol in the inner-most level of the local map or return null.
   Fortran::lower::SymbolBox
-  shallowLookupSymbol(const Fortran::semantics::Symbol &sym) {
-    if (Fortran::lower::SymbolBox v = localSymbols.shallowLookupSymbol(sym))
+  shallowLookupSymbol(const Fortran::semantics::Symbol &sym,
+                      Fortran::lower::SymMap *symMap = nullptr) {
+    auto &map = (symMap == nullptr ? localSymbols : *symMap);
+    if (Fortran::lower::SymbolBox v = map.shallowLookupSymbol(sym))
       return v;
     return {};
   }
@@ -1069,8 +1076,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   /// Find the symbol in one level up of symbol map such as for host-association
   /// in OpenMP code or return null.
   Fortran::lower::SymbolBox
-  lookupOneLevelUpSymbol(const Fortran::semantics::Symbol &sym) {
-    if (Fortran::lower::SymbolBox v = localSymbols.lookupOneLevelUpSymbol(sym))
+  lookupOneLevelUpSymbol(const Fortran::semantics::Symbol &sym,
+                         Fortran::lower::SymMap *symMap = nullptr) override {
+    auto &map = (symMap == nullptr ? localSymbols : *symMap);
+    if (Fortran::lower::SymbolBox v = map.lookupOneLevelUpSymbol(sym))
       return v;
     return {};
   }
@@ -1079,15 +1088,16 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   /// already in the map and \p forced is `false`, the map is not updated.
   /// Instead the value `false` is returned.
   bool addSymbol(const Fortran::semantics::SymbolRef sym,
-                 fir::ExtendedValue val, bool forced = false) {
-    if (!forced && lookupSymbol(sym))
+                 fir::ExtendedValue val, bool forced = false,
+                 Fortran::lower::SymMap *symMap = nullptr) {
+    auto &map = (symMap == nullptr ? localSymbols : *symMap);
+    if (!forced && lookupSymbol(sym, &map))
       return false;
     if (lowerToHighLevelFIR()) {
-      Fortran::lower::genDeclareSymbol(*this, localSymbols, sym, val,
-                                       fir::FortranVariableFlagsEnum::None,
-                                       forced);
+      Fortran::lower::genDeclareSymbol(
+          *this, map, sym, val, fir::FortranVariableFlagsEnum::None, forced);
     } else {
-      localSymbols.addSymbol(sym, val, forced);
+      map.addSymbol(sym, val, forced);
     }
     return true;
   }
diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index be2117efbabc0a0..edf0e6b2c0d90cd 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -169,12 +169,15 @@ class DataSharingProcessor {
   void collectSymbolsForPrivatization();
   void insertBarrier();
   void collectDefaultSymbols();
-  void privatize();
+  void
+  privatize(llvm::SetVector<mlir::omp::PrivateClauseOp> *privateInitializers);
   void defaultPrivatize();
   void copyLastPrivatize(mlir::Operation *op);
   void insertLastPrivateCompare(mlir::Operation *op);
-  void cloneSymbol(const Fortran::semantics::Symbol *sym);
-  void copyFirstPrivateSymbol(const Fortran::semantics::Symbol *sym);
+  void cloneSymbol(const Fortran::semantics::Symbol *sym,
+                   Fortran::lower::SymMap *symMap = nullptr);
+  void copyFirstPrivateSymbol(const Fortran::semantics::Symbol *sym,
+                              Fortran::lower::SymMap *symMap);
   void copyLastPrivateSymbol(const Fortran::semantics::Symbol *sym,
                              mlir::OpBuilder::InsertPoint *lastPrivIP);
   void insertDeallocs();
@@ -197,7 +200,8 @@ class DataSharingProcessor {
   // Step2 performs the copying for lastprivates and requires knowledge of the
   // MLIR operation to insert the last private update. Step2 adds
   // dealocation code as well.
-  void processStep1();
+  void processStep1(llvm::SetVector<mlir::omp::PrivateClauseOp>
+                        *privateInitializers = nullptr);
   void processStep2(mlir::Operation *op, bool isLoop);
 
   void setLoopIV(mlir::Value iv) {
@@ -206,10 +210,11 @@ class DataSharingProcessor {
   }
 };
 
-void DataSharingProcessor::processStep1() {
+void DataSharingProcessor::processStep1(
+    llvm::SetVector<mlir::omp::PrivateClauseOp> *privateInitializers) {
   collectSymbolsForPrivatization();
   collectDefaultSymbols();
-  privatize();
+  privatize(privateInitializers);
   defaultPrivatize();
   insertBarrier();
 }
@@ -239,20 +244,23 @@ void DataSharingProcessor::insertDeallocs() {
     }
 }
 
-void DataSharingProcessor::cloneSymbol(const Fortran::semantics::Symbol *sym) {
+void DataSharingProcessor::cloneSymbol(const Fortran::semantics::Symbol *sym,
+                                       Fortran::lower::SymMap *symMap) {
   // Privatization for symbols which are pre-determined (like loop index
   // variables) happen separately, for everything else privatize here.
   if (sym->test(Fortran::semantics::Symbol::Flag::OmpPreDetermined))
     return;
-  bool success = converter.createHostAssociateVarClone(*sym);
+  bool success = converter.createHostAssociateVarClone(*sym, symMap);
   (void)success;
   assert(success && "Privatization failed due to existing binding");
 }
 
 void DataSharingProcessor::copyFirstPrivateSymbol(
-    const Fortran::semantics::Symbol *sym) {
-  if (sym->test(Fortran::semantics::Symbol::Flag::OmpFirstPrivate))
-    converter.copyHostAssociateVar(*sym);
+    const Fortran::semantics::Symbol *sym,
+    Fortran::lower::SymMap *symMap = nullptr) {
+  if (sym->test(Fortran::semantics::Symbol::Flag::OmpFirstPrivate)) {
+    converter.copyHostAssociateVar(*sym, nullptr, symMap);
+  }
 }
 
 void DataSharingProcessor::copyLastPrivateSymbol(
@@ -487,8 +495,11 @@ void DataSharingProcessor::collectDefaultSymbols() {
   }
 }
 
-void DataSharingProcessor::privatize() {
+void DataSharingProcessor::privatize(
+    llvm::SetVector<mlir::omp::PrivateClauseOp> *privateInitializers) {
+
   for (const Fortran::semantics::Symbol *sym : privatizedSymbols) {
+
     if (const auto *commonDet =
             sym->detailsIf<Fortran::semantics::CommonBlockDetails>()) {
       for (const auto &mem : commonDet->objects()) {
@@ -496,6 +507,41 @@ void DataSharingProcessor::privatize() {
         copyFirstPrivateSymbol(&*mem);
       }
     } else {
+      if (privateInitializers != nullptr) {
+        auto ip = firOpBuilder.saveInsertionPoint();
+
+        auto moduleOp = firOpBuilder.getInsertionBlock()
+                            ->getParentOp()
+                            ->getParentOfType<mlir::ModuleOp>();
+
+        firOpBuilder.setInsertionPoint(&moduleOp.getBodyRegion().front(),
+                                       moduleOp.getBodyRegion().front().end());
+
+        Fortran::lower::SymbolBox hsb = converter.lookupOneLevelUpSymbol(*sym);
+        assert(hsb && "Host symbol box not found");
+
+        auto clauseOp = firOpBuilder.create<mlir::omp::PrivateClauseOp>(
+            hsb.getAddr().getLoc(), hsb.getAddr().getType(),
+            sym->name().ToString());
+        firOpBuilder.setInsertionPointToEnd(&clauseOp.getBody().front());
+
+        Fortran::semantics::Symbol cp = *sym;
+        Fortran::lower::SymMap map;
+        map.addSymbol(cp, clauseOp.getArgument(0));
+        map.pushScope();
+
+        cloneSymbol(&cp, &map);
+        copyFirstPrivateSymbol(&cp, &map);
+
+        firOpBuilder.create<mlir::omp::YieldOp>(
+            hsb.getAddr().getLoc(), map.shallowLookupSymbol(cp).getAddr());
+
+        firOpBuilder.restoreInsertionPoint(ip);
+      }
+
+      // TODO: This will eventually be an else to the `if` above it. For now, I
+      // emit both the outlined privatizer AND directly emitted cloning and
+      // copying ops while I am testing.
       cloneSymbol(sym);
       copyFirstPrivateSymbol(sym);
     }
@@ -2272,6 +2318,7 @@ static void createBodyOfOp(
     llvm::SmallVector<mlir::Type> tiv(args.size(), loopVarType);
     llvm::SmallVector<mlir::Location> locs(args.size(), loc);
     firOpBuilder.createBlock(&op.getRegion(), {}, tiv, locs);
+
     // The argument is not currently in memory, so make a temporary for the
     // argument, and store it there, then bind that location to the argument.
     mlir::Operation *storeOp = nullptr;
@@ -2291,10 +2338,11 @@ static void createBodyOfOp(
 
   // If it is an unstructured region and is not the outer region of a combined
   // construct, create empty blocks for all evaluations.
-  if (eval.lowerAsUnstructured() && !outerCombined)
+  if (eval.lowerAsUnstructured() && !outerCombined) {
     Fortran::lower::createEmptyRegionBlocks<mlir::omp::TerminatorOp,
                                             mlir::omp::YieldOp>(
         firOpBuilder, eval.getNestedEvaluations());
+  }
 
   // Start with privatization, so that the lowering of the nested
   // code will use the right symbols.
@@ -2307,12 +2355,14 @@ static void createBodyOfOp(
   if (privatize) {
     if (!dsp) {
       tempDsp.emplace(converter, *clauses, eval);
-      tempDsp->processStep1();
+      llvm::SetVector<mlir::omp::PrivateClauseOp> privateInitializers;
+      tempDsp->processStep1(&privateInitializers);
     }
   }
 
   if constexpr (std::is_same_v<Op, mlir::omp::ParallelOp>) {
     threadPrivatizeVars(converter, eval);
+
     if (clauses) {
       firOpBuilder.setInsertionPoint(marker);
       ClauseProcessor(converter, *clauses).processCopyin();
@@ -2361,6 +2411,7 @@ static void createBodyOfOp(
     if (exits.size() == 1)
       return exits[0];
     mlir::Block *exit = firOpBuilder.createBlock(&region);
+
     for (mlir::Block *b : exits) {
       firOpBuilder.setInsertionPointToEnd(b);
       firOpBuilder.create<mlir::cf::BranchOp>(loc, exit);
@@ -2382,8 +2433,9 @@ static void createBodyOfOp(
         assert(tempDsp.has_value());
         tempDsp->processStep2(op, isLoop);
       } else {
-        if (isLoop && args.size() > 0)
+        if (isLoop && args.size() > 0) {
           dsp->setLoopIV(converter.getSymbolAddress(*args[0]));
+        }
         dsp->processStep2(op, isLoop);
       }
     }
diff --git a/flang/test/Lower/OpenMP/FIR/delayed_privatization.f90 b/flang/test/Lower/OpenMP/FIR/delayed_privatization.f90
new file mode 100644
index 000000000000000..7b4d9135c6e070c
--- /dev/null
+++ b/flang/test/Lower/OpenMP/FIR/delayed_privatization.f90
@@ -0,0 +1,42 @@
+subroutine private_clause_allocatable()
+        integer :: xxx
+        integer :: yyy
+
+!$OMP PARALLEL FIRSTPRIVATE(xxx, yyy)
+!$OMP END PARALLEL
+
+end subroutine
+
+! This is what flang emits with the PoC:
+! --------------------------------------
+!
+!func.func @_QPprivate_clause_allocatable() {
+!  %0 = fir.alloca i32 {bindc_name = "xxx", uniq_name = "_QFprivate_clause_allocatableExxx"}
+!  %1 = fir.alloca i32 {bindc_name = "yyy", uniq_name = "_QFprivate_clause_allocatableEyyy"}
+!  omp.parallel {
+!    %2 = fir.alloca i32 {bindc_name = "xxx", pinned, uniq_name = "_QFprivate_clause_allocatableExxx"}
+!    %3 = fir.load %0 : !fir.ref<i32>
+!    fir.store %3 to %2 : !fir.ref<i32>
+!    %4 = fir.alloca i32 {bindc_name = "yyy", pinned, uniq_name = "_QFprivate_clause_allocatableEyyy"}
+!    %5 = fir.load %1 : !fir.ref<i32>
+!    fir.store %5 to %4 : !fir.ref<i32>
+!    omp.terminator
+!  }
+!  return
+!}
+!
+!"omp.private"() <{function_type = (!fir.ref<i32>) -> !fir.ref<i32>, sym_name = "xxx.privatizer"}> ({
+!^bb0(%arg0: !fir.ref<i32>):
+!  %0 = fir.alloca i32 {bindc_name = "xxx", pinned, uniq_name = "_QFprivate_clause_allocatableExxx"}
+!  %1 = fir.load %arg0 : !fir.ref<i32>
+!  fir.store %1 to %0 : !fir.ref<i32>
+!  omp.yield(%0 : !fir.ref<i32>)
+!}) : () -> ()
+!
+!"omp.private"() <{function_type = (!fir.ref<i32>) -> !fir.ref<i32>, sym_name = "yyy.privatizer"}> ({
+!^bb0(%arg0: !fir.ref<i32>):
+!  %0 = fir.alloca i32 {bindc_name = "yyy", pinned, uniq_name = "_QFprivate_clause_allocatableEyyy"}
+!  %1 = fir.load %arg0 : !fir.ref<i32>
+!  fir.store %1 to %0 : !fir.ref<i32>
+!  omp.yield(%0 : !fir.ref<i32>)
+!}) : () -> ()
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index 96c15e775a3024b..cc1bc97faaebd4d 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -16,6 +16,7 @@
 
 include "mlir/IR/EnumAttr.td"
 include "mlir/IR/OpBase.td"
+include "mlir/Interfaces/FunctionInterfaces.td"
 include "mlir/Interfaces/SideEffectInterfaces.td"
 include "mlir/Interfaces/ControlFlowInterfaces.td"
 include "mlir/IR/SymbolInterfaces.td"
@@ -621,7 +622,7 @@ def SimdLoopOp : OpenMP_Op<"simdloop", [AttrSizedOperandSegments,
 def YieldOp : OpenMP_Op<"yield",
     [Pure, ReturnLike, Terminator,
      ParentOneOf<["WsLoopOp", "ReductionDeclareOp",
-     "AtomicUpdateOp", "SimdLoopOp"]>]> {
+     "AtomicUpdateOp", "SimdLoopOp", "PrivateClauseOp"]>]> {
   let summary = "loop yield and termination operation";
   let description = [{
     "omp.yield" yields SSA values from the OpenMP dialect op region and
@@ -1478,6 +1479,38 @@ def Target_UpdateDataOp: OpenMP_Op<"target_update_data",
 //===----------------------------------------------------------------------===//
 // 2.14.5 target construct
 //===----------------------------------------------------------------------===//
+def PrivateClauseOp : OpenMP_Op<"private", [
+    IsolatedFromAbove, FunctionOpInterface
+  ]> {
+  let summary = "xxxx";
+ ...
[truncated]

Copy link

github-actions bot commented Jan 29, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

flang/lib/Lower/OpenMP.cpp Outdated Show resolved Hide resolved
@ergawy ergawy force-pushed the support_target_private_funclike_op branch 4 times, most recently from 852ff4e to 113dd3b Compare January 30, 2024 11:18
@ergawy ergawy force-pushed the support_target_private_funclike_op branch from 113dd3b to 0b423d3 Compare January 30, 2024 13:55
@ergawy ergawy force-pushed the support_target_private_funclike_op branch from 0b423d3 to ef847ab Compare January 31, 2024 11:06
@ergawy
Copy link
Member Author

ergawy commented Jan 31, 2024

@clementval @kiranchandramohan thanks a lot for taking a look. I think this WIP is getting closer and closer to having a complete end-to-end implementation. If I not missing anything, what remains is to actually inline the the privatizer in the op before converting to LLVM IR which I am looking into now.

Of course, I still need to support more complex symbols (e.g. allocatables) but I am focusing on simple variables for now to get something working end-to-end.

@kiranchandramohan
Copy link
Contributor

@clementval @kiranchandramohan thanks a lot for taking a look. I think this WIP is getting closer and closer to having a complete end-to-end implementation. If I not missing anything, what remains is to actually inline the the privatizer in the op before converting to LLVM IR which I am looking into now.

Of course, I still need to support more complex symbols (e.g. allocatables) but I am focusing on simple variables for now to get something working end-to-end.

Thanks for the work here. Forgot to mention that the OpenMPIRBuilder has a privatization call back that is designed to implement privatization. You can consider using that.
(

PrivCB(InnerAllocaIP, Builder.saveIP(), V, *Inner, ReplacementValue));
)

Copy link
Contributor

@luporl luporl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, your approach to implement delayed privatization looks good to me,
requiring only a minor modification to AbstractConverter/Bridge.

I just have a couple comments.

flang/lib/Lower/OpenMP.cpp Outdated Show resolved Hide resolved
flang/lib/Lower/OpenMP.cpp Outdated Show resolved Hide resolved
flang/lib/Lower/OpenMP.cpp Outdated Show resolved Hide resolved
mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td Outdated Show resolved Hide resolved
mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td Outdated Show resolved Hide resolved
flang/test/Lower/OpenMP/FIR/delayed_privatization.f90 Outdated Show resolved Hide resolved
mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td Outdated Show resolved Hide resolved
flang/test/Lower/OpenMP/FIR/delayed_privatization.f90 Outdated Show resolved Hide resolved
@ergawy ergawy force-pushed the support_target_private_funclike_op branch from ef847ab to 0d05d4a Compare February 2, 2024 08:14
@ergawy ergawy force-pushed the support_target_private_funclike_op branch from 8e5d75f to b1b1bc1 Compare February 4, 2024 12:45
@ergawy
Copy link
Member Author

ergawy commented Feb 5, 2024

@kiranchandramohan @clementval @luporl can you 🙏 have a look at the proposed plan above and let me know if you agree with it? Apologies for the noise but I just want to make sure we are aligned since this touches a lot of places in the pipeline already.

I am specially interested in your opinion about the "feature flag" thing to enable/disable delayed privatization both in bbcand flang-new at well (and having it disabled by default). Let me know if this is something you agree with or have other suggestions so that we can roll out support for delayed privatization in a practical and gradual manner.

@ergawy ergawy force-pushed the support_target_private_funclike_op branch from b1b1bc1 to 0f72db6 Compare February 5, 2024 11:45
@kiranchandramohan
Copy link
Contributor

@kiranchandramohan @clementval @luporl can you 🙏 have a look at the proposed plan above and let me know if you agree with it? Apologies for the noise but I just want to make sure we are aligned since this touches a lot of places in the pipeline already.

I am specially interested in your opinion about the "feature flag" thing to enable/disable delayed privatization both in bbcand flang-new at well (and having it disabled by default). Let me know if this is something you agree with or have other suggestions so that we can roll out support for delayed privatization in a practical and gradual manner.

Thanks @ergawy for pushing this forward. This will add more capabilities to the dialect and make lowering easier. Your approach of gradually rolling out the feature makes a lot of sense. We should also ensure that applications that are enabled now continues to work (spec-2017 speed, subset of specomp2012, subset of NPB, SNAP) and that there is no regression in the gfortran testsuite.

When introducing the changes to OpenMP dialect, could you add a reply to https://discourse.llvm.org/t/rfc-privatisation-in-openmp-dialect/3526 saying that we are adding support for privatisation clauses in the dialect. We see benefits for this in:
-> lowering from frontend to MLIR
-> better support for task and target constructs. Tasks have delayed execution, so the private/firstprivate variables have to be inserted in the task structure.
-> better lowering (to LLVM) for index variables of loops. #75836
-> better support for lastprivate, where the runtime can support detection of the last iteration.

I think it is fine to add a single flag to the file that can enable delayed privatization if it is accessible as flang-new -mmlir <flag-name> like for e.g (-fdebug-dump-pre-fir) and directly with bbc.

@ergawy ergawy force-pushed the support_target_private_funclike_op branch from 0f72db6 to 89ab3cf Compare February 6, 2024 07:39
ergawy added a commit to ergawy/llvm-project that referenced this pull request Feb 6, 2024
This PR outlines the arguments of the open CodeGen functions into 2
separate structs. This was, in part, motivated by the delayed
privatization
WIP llvm#79862 where we had to extend the signatures of both functions
containing quite a bit of default values (`nullptr`, `false`). This PR
does not add any new arguments yet though, just outlines the existing
ones.
ergawy added a commit to ergawy/llvm-project that referenced this pull request Feb 6, 2024
This PR outlines the arguments of the open CodeGen functions into 2
separate structs. This was, in part, motivated by the delayed
privatization
WIP llvm#79862 where we had to extend the signatures of both functions
containing quite a bit of default values (`nullptr`, `false`). This PR
does not add any new arguments yet though, just outlines the existing
ones.
ergawy added a commit that referenced this pull request Feb 6, 2024
…80817)

This PR outlines the arguments of the open CodeGen functions into 2
separate structs. This was, in part, motivated by the delayed
privatization WIP #79862 where we had to extend the signatures of both
functions containing quite a bit of default values (`nullptr`, `false`).
This PR does not add any new arguments yet though, just outlines the
existing ones.
ergawy added a commit to ergawy/llvm-project that referenced this pull request Feb 7, 2024
This PR adds a new op to the OpenMP dialect: `PrivateClauseOp`. This op
will be later used to model `[first]private` clauses for differnt OpenMP
directives.

This is part of productizing the "delayed privatization" PoC wich can be
found in llvm#79862.
This is a PoC for delayed privatization in OpenMP. Instead of directly
emitting privatization code in the frontend, we add a new op to outline
the privatization logic for a symbol and call-like mapping that maps
from the host symbol to an outlined function-like privatizer op.

Later, we would inline the delayed privatizer function-like op in the
OpenMP region to basically get the same code generated directly by the
fronend at the moment.
ergawy added a commit to ergawy/llvm-project that referenced this pull request Feb 7, 2024
This PR adds a new op to the OpenMP dialect: `PrivateClauseOp`. This op
will be later used to model `[first]private` clauses for differnt OpenMP
directives.

This is part of productizing the "delayed privatization" PoC wich can be
found in llvm#79862.
@ergawy ergawy force-pushed the support_target_private_funclike_op branch from d6c453f to 717b57a Compare February 7, 2024 14:25
@@ -3505,6 +3505,18 @@ struct ZeroOpConversion : public FIROpConversion<fir::ZeroOp> {
}
};

class DeclareOpConversion : public FIROpConversion<fir::DeclareOp> {
Copy link
Member Author

@ergawy ergawy Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While experimenting with delayed privation for hlfir (so far I have been experimenting with fir), I came across a small issue which is that fir.declare is not support when converting from fir to mlir dialect. That's why I had to copy this conversion pattern from PreCGRewrite.cpp. Since DeclareOpConversion does not convert to op in the fir::cg dialect, I think it would be nice to move it here to make conversion from fir to llvm dialect more complete.

Do you folks have any input on this?

If you agree to share the pass, I will open a separate PR with proper testing of course.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised by this. In a quick experiment flang-new -mmlir -mlir-pass-statistics -c [...] gives me the same output both with and without -fopenmp. Why isn't CodeGenRewrite doing the job for you? This pass should run earlier than FIRToLLVMLowering (CodeGen.cpp).

In @jeanPerier's commit adding fir::DeclareOp to PreCGRewrite.cpp, he mentions that it was placed here to avoid adding support for fir.shape to codegen. See c852174

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is fir::createOpenMPFIRPassPipeline, so I expect that I should be seeing some extra passes for OpenMP. It is weird that I didn't.

Anyway that runs immediately after lowering so I don't think it should effect codegen for fir.declare.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tblah for taking a look. I was referring to lowering using fir-opt --fir-to-llvm-ir for testing purposes. I expected that the fir-opt --fir-to-llvm-ir should provide a more-or-less complete conversion pipeline from fir to llvm. This is mostly to make testing and experimenting easier.

I am a bit of newbie to the project so I might not be using the different tools and their flags properly yet.

Copy link
Contributor

@tblah tblah Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No fir-oppt --fir-to-llvm-ir is not sufficient. The FIR pass pipeline is quite long (see https://github.com/llvm/llvm-project/blob/main/flang/test/Driver/mlir-pass-pipeline.f90). A few of these are optimizations (e.g. SimplifyIntrinsics, CSE, AliasTags), but several different passes are needed to convert FIR into LLVM (even more for HLFIR).

Roughly, to get from FIR to LLVM you need the passes in createDefaultFIRCodeGenPassPipeline:

inline void createDefaultFIRCodeGenPassPipeline(

From the command line you can pipe FIR into tco and it will output LLVM IR. fir-opt is more for running individual passes. The help text is a bit hard to follow, you can find tco specific options at the top of this file https://github.com/llvm/llvm-project/blob/main/flang/tools/tco/tco.cpp#L41

For example, bbc -emit-hlfir -o - file.f90 | tco is a bit like running flang-new -fc1 -O2 -emit-llvm -o - file.f90

ergawy added a commit to ergawy/llvm-project that referenced this pull request Feb 8, 2024
This PR adds a new op to the OpenMP dialect: `PrivateClauseOp`. This op
will be later used to model `[first]private` clauses for differnt OpenMP
directives.

This is part of productizing the "delayed privatization" PoC wich can be
found in llvm#79862.
ergawy added a commit to ergawy/llvm-project that referenced this pull request Feb 8, 2024
This PR adds a new op to the OpenMP dialect: `PrivateClauseOp`. This op
will be later used to model `[first]private` clauses for differnt OpenMP
directives.

This is part of productizing the "delayed privatization" PoC wich can be
found in llvm#79862.
ergawy added a commit to ergawy/llvm-project that referenced this pull request Feb 11, 2024
This PR adds a new op to the OpenMP dialect: `PrivateClauseOp`. This op
will be later used to model `[first]private` clauses for differnt OpenMP
directives.

This is part of productizing the "delayed privatization" PoC wich can be
found in llvm#79862.
ergawy added a commit to ergawy/llvm-project that referenced this pull request Feb 11, 2024
This PR adds a new op to the OpenMP dialect: `PrivateClauseOp`. This op
will be later used to model `[first]private` clauses for differnt OpenMP
directives.

This is part of productizing the "delayed privatization" PoC wich can be
found in llvm#79862.
ergawy added a commit to ergawy/llvm-project that referenced this pull request Feb 12, 2024
This PR adds a new op to the OpenMP dialect: `PrivateClauseOp`. This op
will be later used to model `[first]private` clauses for differnt OpenMP
directives.

This is part of productizing the "delayed privatization" PoC wich can be
found in llvm#79862.
ergawy added a commit to ergawy/llvm-project that referenced this pull request Feb 12, 2024
This PR adds a new op to the OpenMP dialect: `PrivateClauseOp`. This op
will be later used to model `[first]private` clauses for differnt OpenMP
directives.

This is part of productizing the "delayed privatization" PoC wich can be
found in llvm#79862.
ergawy added a commit that referenced this pull request Feb 13, 2024
This PR adds a new op to the OpenMP dialect: `PrivateClauseOp`. This op
will be later used to model `[first]private` clauses for differnt OpenMP
directives.

This is part of productizing the "delayed privatization" PoC which can
be found in #79862.
@ergawy
Copy link
Member Author

ergawy commented Feb 15, 2024

Abandoning since I am working on productizing the whole thing in separate PRs (see description of this PR for a detailed plan and tracking of that plan). I will keep updating the PR description to keep notes of the progress.

@ergawy ergawy closed this Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants