Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrrectly Master-workshare nesting #857

Closed
dmishura opened this issue Oct 14, 2024 · 6 comments
Closed

Incorrrectly Master-workshare nesting #857

dmishura opened this issue Oct 14, 2024 · 6 comments
Assignees

Comments

@dmishura
Copy link

Found in code evaluation with CP2K. Code hung with latest ifx compiler. Caused by nested OpenMP regions that doesn't allow by standard. Code path contain multiple source files, so only short snippets here.

  1. Parallel section creation dbcsr_finalize in dbscr_work_operations.F
!$OMP           PARALLEL DEFAULT (NONE) &
!$OMP                    SHARED (matrix, old_row_p, old_col_i, old_blk_p,&
!$OMP                            start_offset, sort_data)
            CALL dbcsr_merge_all(matrix, &
                                 old_row_p, old_col_i, old_blk_p, &
                                 sort_data=sort_data)
!$OMP           END PARALLEL
  1. Create Master section dbcsr_merge_all in dbscr_work_operations.F
!$OMP     MASTER
      CALL dbcsr_clearfrom_index_array(matrix, dbcsr_slot_row_p)
      CALL dbcsr_clearfrom_index_array(matrix, dbcsr_slot_col_i)
      CALL dbcsr_clearfrom_index_array(matrix, dbcsr_slot_blk_p)
      CALL dbcsr_addto_index_array(matrix, dbcsr_slot_row_p, &
                                   DATA=new_row_p(1:nrows + 1), extra=new_nblks*2)
      CALL dbcsr_addto_index_array(matrix, dbcsr_slot_col_i, &
                                   DATA=new_col_i(1:new_nblks))
      IF (sort_data) THEN
         CALL dbcsr_addto_index_array(matrix, dbcsr_slot_blk_p, &
                                      DATA=new_blk_p_sorted)
      ELSE
         CALL dbcsr_addto_index_array(matrix, dbcsr_slot_blk_p, &
                                      DATA=new_blk_p)
      END IF
      matrix%nblks = new_nblks
      matrix%nze = new_nze
      matrix%index(dbcsr_slot_nblks) = matrix%nblks
      matrix%index(dbcsr_slot_nze) = matrix%nze
      CALL dbcsr_repoint_index(matrix)
!$OMP     END MASTER

  1. Transit dbcsr_addto_index_array in dbcsr_index_operations.F
         CALL ensure_array_size(matrix%index, lb=1, ub=ub_new, &
                                factor=default_resize_factor, &
                                nocopy=.FALSE., &
                                memory_type=matrix%index_memory_type)

  1. Transit ensure_array_size in dbcsr_ptr_util.F
            IF (ub_orig - lb_orig + 1 .gt. 0) THEN
               !newarray(lb_orig:ub_orig) = array(lb_orig:ub_orig)
               CALL mem_copy_${nametype1}$ (newarray(lb_orig:ub_orig), &
                                            array(lb_orig:ub_orig), ub_orig - lb_orig + 1)
            END IF

  1. Create workshare section mem_copy_${nametype1}$ in dbcsr_ptr_util.F
      SUBROUTINE mem_copy_${nametype1}$ (dst, src, n)
     !! Copies memory area

         INTEGER, INTENT(IN) :: n
        !! length of copy
         ${type1}$, DIMENSION(1:n), INTENT(OUT) :: dst
        !! destination memory
         ${type1}$, DIMENSION(1:n), INTENT(IN) :: src
        !! source memory
#if !defined(__DBCSR_DISABLE_WORKSHARE)
!$OMP     PARALLEL WORKSHARE DEFAULT(none) SHARED(dst,src)
#endif
         dst(:) = src(:)
#if !defined(__DBCSR_DISABLE_WORKSHARE)
!$OMP     END PARALLEL WORKSHARE
#endif
      END SUBROUTINE mem_copy_${nametype1}$

Situation when __DBCSR_DISABLE_WORKSHARE macros is not defined violated OpenMP standard that says:

" DO, SECTIONS, SINGLE, and WORKSHARE directives are not permitted in the
dynamic extent of CRITICAL, ORDERED, and MASTER directives"

Also please note that MASTER construct is deprecated in latest spec.

Fortran pretty old spec:
https://www.openmp.org/wp-content/uploads/fspec20.pdf

@hfp hfp self-assigned this Oct 15, 2024
@hfp
Copy link
Member

hfp commented Oct 15, 2024

@dmishura Awesome - thank you, Dmitry!

hfp added a commit to hfp/xconfigure that referenced this issue Oct 15, 2024
hfp added a commit to hfp/dbcsr that referenced this issue Oct 16, 2024
hfp added a commit to hfp/dbcsr that referenced this issue Oct 16, 2024
- Cleanup USE OMP_LIB directives.
@hfp
Copy link
Member

hfp commented Oct 16, 2024

I explored an approach to use CP2K's convention checker and essentially ban nested parallelism or more specifically a parallel region inside of a master section. However, the GFortran AST cannot be combined with a call graph query unless much additional work is invested. Alternative approach is to rely on an assertion.

hfp added a commit to hfp/dbcsr that referenced this issue Oct 16, 2024
- Avoid IF-condition as part of the directive.
- Cleanup USE OMP_LIB directives.
hfp added a commit to hfp/dbcsr that referenced this issue Oct 16, 2024
- Avoid IF-condition as part of the directive.
@hfp
Copy link
Member

hfp commented Oct 16, 2024

Also please note that MASTER construct is deprecated in latest spec.

This questions the whole MPI-like programming model in OpenMP based on get_get_thread_num. This is about "OpenMP tasks everywhere" to enable composable parallelism. Nice dream.

( That's too much to ask for ;-)

hfp added a commit to hfp/dbcsr that referenced this issue Oct 16, 2024
- Avoid IF-condition as part of the directive.
hfp added a commit to hfp/dbcsr that referenced this issue Oct 22, 2024
- Avoid IF-condition as part of the directive.
@hfp hfp mentioned this issue Oct 25, 2024
hfp added a commit to hfp/dbcsr that referenced this issue Oct 29, 2024
- Avoid IF-condition as part of the directive.
hfp added a commit to hfp/dbcsr that referenced this issue Nov 4, 2024
- Avoid IF-condition as part of the directive.
hfp added a commit to hfp/dbcsr that referenced this issue Nov 6, 2024
- Avoid IF-condition as part of the directive.
hfp added a commit to hfp/dbcsr that referenced this issue Nov 7, 2024
- Avoid IF-condition as part of the directive.
hfp added a commit to hfp/dbcsr that referenced this issue Nov 11, 2024
- Avoid IF-condition as part of the directive.
hfp added a commit to hfp/dbcsr that referenced this issue Nov 20, 2024
- Avoid nested parallelism (dbcsr_acc_set_active_device).
- Avoid IF-condition as part of the WORKSHARE-directive.
- Keep WORKSHARE if not in parallel region.
hfp added a commit to hfp/dbcsr that referenced this issue Nov 20, 2024
- Avoid nested parallelism (dbcsr_acc_set_active_device).
- Avoid IF-condition as part of the WORKSHARE-directive.
- Keep WORKSHARE if not in parallel region.
hfp added a commit to hfp/dbcsr that referenced this issue Nov 21, 2024
- Avoid nested parallelism (dbcsr_acc_set_active_device).
- Rely on omp_get_level (instead of omp_in_parallel),
  omp_in_parallel only accounts for active regions.
- Avoid IF-condition as part of the WORKSHARE-directive.
- Removed WORKSHARE construct from dbcsr_ptr_util,
- otherwise keep WORKSHARE if not in parallel region.
- There is potentially invalid nesting (parallel+X),
  e.g., nested in master or sections constructs.
@hfp
Copy link
Member

hfp commented Dec 2, 2024

The current thesis is, nested parallelism like PARALLEL WORKSHARE even inside inside of a MASTER section (which in turn is inside of another PARALLEL region) should never hang. GNU Fortran as well as IFORT seems to confirm this at runtime. However, GNU, IFORT, and IFX produce an error message at compile-time if the nest is not "firewalled" by a subroutine but literally nested (second code-path when writing #if 0):

      PROGRAM NESTED
        IMPLICIT NONE
        INTEGER :: dst(24), src(24), i

        DO i = 1, SIZE(src)
          src(i) = i - 1
        END DO

!$OMP   PARALLEL
!$OMP   MASTER
#if 1
        CALL workshare(dst, src)
#else
!$OMP   PARALLEL WORKSHARE
        dst(:) = src(:)
!$OMP   END PARALLEL WORKSHARE
#endif
!$OMP   END MASTER
!$OMP   END PARALLEL

        DO i = 1, SIZE(dst)
          WRITE(*,*) dst(i)
        END DO

      CONTAINS
        SUBROUTINE workshare(dst, src)
          INTEGER, INTENT(OUT) :: dst(:)
          INTEGER, INTENT(IN)  :: src(:)
!$OMP     PARALLEL WORKSHARE
          dst(:) = src(:)
!$OMP     END PARALLEL WORKSHARE
        END SUBROUTINE
      END PROGRAM

The confusion in compiler's implementation seems to stem from PARALLEL WORKSHARE vs. just WORKSHARE, with the latter indeed incorrect inside of a master/single section.

@alazzaro
Copy link
Member

alazzaro commented Dec 2, 2024

Nice investigation @hfp !
I would say we can remove all WORKSHARE section, we have memory-boud anyway...
I will work on DBCSR starting this week...

hfp added a commit to hfp/dbcsr that referenced this issue Dec 6, 2024
- Avoid nested parallelism (dbcsr_acc_set_active_device).
- Rely on omp_get_level (instead of omp_in_parallel),
  omp_in_parallel only accounts for active regions.
- Avoid IF-condition as part of the WORKSHARE-directive.
- Removed (nested) PARALLEL WORKSHARE constructs.
hfp added a commit to hfp/dbcsr that referenced this issue Dec 6, 2024
- Avoid nested parallelism (dbcsr_acc_set_active_device).
- Rely on omp_get_level (instead of omp_in_parallel),
  omp_in_parallel only accounts for active regions.
- Avoid IF-condition as part of the WORKSHARE-directive.
- Removed (nested) PARALLEL WORKSHARE constructs.
alazzaro pushed a commit that referenced this issue Dec 9, 2024
- Avoid nested parallelism (dbcsr_acc_set_active_device).
- Rely on omp_get_level (instead of omp_in_parallel),
  omp_in_parallel only accounts for active regions.
- Avoid IF-condition as part of the WORKSHARE-directive.
- Removed (nested) PARALLEL WORKSHARE constructs.
@alazzaro
Copy link
Member

alazzaro commented Dec 9, 2024

Also please note that MASTER construct is deprecated in latest spec.

Fortran pretty old spec: https://www.openmp.org/wp-content/uploads/fspec20.pdf

Thanks for pointing this out.
I would not change MASTER to the new MASKED construct, at least not with the next release...
I will open a specific issue as a reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants