-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove UNI_OMP guards surrounding #pragma omp...
#526
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev-master #526 +/- ##
==============================================
+ Coverage 16.54% 17.58% +1.04%
==============================================
Files 211 211
Lines 48829 44733 -4096
Branches 18900 14941 -3959
==============================================
- Hits 8080 7868 -212
+ Misses 36603 32751 -3852
+ Partials 4146 4114 -32 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good for now. But in #517 suggests to remove #pragma omp parallel for schedule(dynamic)
everywhere, so the issue still needs some work.
Yes, all of the Perhaps try some test case with 3 benchmarks:
I tried (1) and (3), using
It gets worse the more threads you add because they are all fighting each other for the same L1 cache entries. |
Get the correct number of CPU cores when OpenMP is not used.
The compilers don't stop compiling if they cannot find the token following `#pragma`, so we don't have to use the compile definition to guard `#pragma omp...` statements. Remove `find_package(GTest)` from the root CMake file becuase we have used `include(GoogleTest)`, which is the new mechanism, to discover the tests.
We don't guard `#pragam omp...` statements now, so they take effects when `-fopenmp` compiler flag is set. `set()` function only affect the targets defined in the same scope (the same directory) and the subdirectories after the line of `set()`. Setting `-fopenmp` flag when the user only enable `USE_MKL` but not enable `USE_OMP` makes the user suprised. By default, MKL links to their own openmp (not gomp) for threading. The build process has no problem with no setting `-fopenmp` compiler flag when using MKL as the provider of BLAS and LAPACK. The above claim is true for the MKL version later then 2023.2.0. I am not sure if it is also true for the older version of MKL.
6a70dd3
to
672b7e9
Compare
Yes, the issue is still there. Because I cannot confirm that all |
This code is very suspicious: Cytnx/src/backend/linalg_internal_cpu/MaxMin_internal.cpp Lines 15 to 46 in 384bafa
It looks like an attempt to do a parallel reduction. Constructing a uint64_t global_max = std::numeric_limits<int64_t>::min();
#pragma omp parallel
{
// Private variable for each thread's local maximum
uint64_t local_max = std::numeric_limits<uint64_t>::min();
// Each thread iterates over part of the array
#pragma omp for
for (std::size_t n = 0; n < Nelem; ++n) {
if (_ten[n] > local_max) {
local_max = _ten[n];
}
}
// Reduce the local maxima into the global maximum
#pragma omp critical
{
if (local_max > global_max) {
global_max = local_max;
}
}
}
_out[0] = global_max; The But this itself is a long-winded way of writing a reduction in OpenMP, which is simply (untested) uint64_t global_max = std::numeric_limits<uint64_t>::min();
// Parallel reduction to compute the maximum value
#pragma omp parallel for reduction(max:global_max)
for (std::size_t n = 0; n < Nelem; ++n) {
if (_ten[n] > global_max) {
global_max = _ten[n];
}
}
_out[0] = global_max; But for sure the OpenMP is not going to give much benefit here anyway, so best option is to simply replace it with something like: uint64_t global_max = std::numeric_limits<int64_t>::min();
std::size_t len = Nelem;
for (std::size_t i = 0; i < len; ++i) {
if (_ten[n] > global_max) global_max = _ten[n];
}
_out[0] = global_max; This is slightly different to the existing non-OpenMP code: Cytnx/src/backend/linalg_internal_cpu/MaxMin_internal.cpp Lines 42 to 45 in 384bafa
In particular, the existing non-OpenMP code is unlikely to be optimizable by the compiler, for reasons discussed in #511 : rather than writing to a local variable, which would almost certainly be kept in a CPU register, it writes to Compare https://godbolt.org/z/scheP5ePb which produces identical code with Rather than re-factor every kernel here separately, I suggest replacing all of the functions in this file with a template, and also consider what the |
For non-complex, maybe just use |
And for the complex case too, just need to decide what the comparison function is. |
Remove UNI_OMP guards surrounding
#pragma omp...
The compilers don't stop compiling if they cannot find the token
following
#pragma
, so we don't have to use the compile definition toguard
#pragma omp...
statements.This is a change for #517.
Remove
find_package(GTest)
from the root CMake fileWe have used
include(GoogleTest)
, which is the new mechanism, to discover thetests.
Remove
-fopenmp
flag when only enabling MKLWe don't guard
#pragam omp...
statements now, so they take effectswhen
-fopenmp
compiler flag is set.set()
function only affect thetargets defined in the same scope (the same directory) and the
subdirectories after the line of
set()
. Setting-fopenmp
flag whenthe user only enable
USE_MKL
but not enableUSE_OMP
makes the usersuprised.
By default, MKL links to their own openmp (not gomp) for threading. The
build process has no problem with no setting
-fopenmp
compiler flagwhen using MKL as the provider of BLAS and LAPACK. The above claim is
true for the MKL version later then 2023.2.0. I am not sure if it is
also true for the older version of MKL.