-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intrinsics module with alternative implementations #915
base: master
Are you sure you want to change the base?
Conversation
One philosophical question: should the fsum interface be renamed to Regarding the kahan versions, given that the accuracy gains are close between the pure chunked version and the kahan one, I'm wondering which level of support should be enabled to switch between them? |
IMHO shorter names are better, and don't see a problem if they overlap with the intrinsics. First, because one can always pick the right version: use stdlib_intrinsics, only: dot_product vs. ! Force using intrinsic
intrinsic :: dot_product And then because they can be augmented by more/different arguments c = dot_product(a,b) ! intrinsic
c = dot_product(a,b,mode='kahan') ! stdlib
c = dot_product(a,b,mode='blocked') ! stdlib
... I find this more elegant and definitely not confusing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jalvesz. LGTM. It seems to be close to be ready for mergin.
|
||
#### Description | ||
|
||
The `stdlib_sum` function can replace the intrinsic `sum` for `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential as well as reducing the round-off error. This procedure is recommended when summing large arrays, for repetitive summation of smaller arrays consider the classical `sum`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it not for integer
?
|
||
#### Description | ||
|
||
The `stdlib_dot_product_kahan` function can replace the intrinsic `dot_product` for 1D `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential , complemented by the same `elemental` kernel based on the [kahan summation](https://en.wikipedia.org/wiki/Kahan_summation_algorithm) used for `stdlib_sum` to reduce the round-off error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the license of wikipedia in agreement with the MIT license of stdlib?
|
||
## Introduction | ||
|
||
The `stdlib_intrinsics` module provides replacements for some of the well known intrinsic functions found in Fortran compilers for which either a faster and/or more accurate implementation is found which has also proven of interest to the Fortran community. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: why are these functions not implemented by the compilers if they are faster and more accurate? Is it a standard limitation?
For the cases where it is less accurate, I think there should be a warning in the specs.
#:set RANKS = range(2, MAXRANK + 1) | ||
|
||
module stdlib_intrinsics | ||
!!Replacement for certain Fortran intrinsic functions offering either faster and/or more accurate implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!!Replacement for certain Fortran intrinsic functions offering either faster and/or more accurate implementations. | |
!!Replacement of some Fortran intrinsic functions offering either faster and/or more accurate implementations. |
!! This interface provides standard conforming call for sum of elements of any rank. | ||
!! The 1-D base implementation follows a chunked approach for optimizing performance and increasing accuracy. | ||
!! The `N-D` interfaces calls upon the `(N-1)-D` implementation. | ||
!! Supported data types include `real` and `complex`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are integers not supported?
x(i) = 8*atan(1._${k1}$)*(real(i,kind=${k1}$)-0.5_${k1}$)/real(n,kind=${k1}$)**2 | ||
end do | ||
allocate(mask(n),source=.false.); mask(1:n:2) = .true. | ||
allocate(nmask(n)); nmask = .not.mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allocate(nmask(n)); nmask = .not.mask | |
allocate(nmask, source = .not.mask) |
end do | ||
|
||
allocate(mask(n),source=.false.); mask(1:n:2) = .true. | ||
allocate(nmask(n)); nmask = .not.mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allocate(nmask(n)); nmask = .not.mask | |
allocate(nmask, source = .not.mask) |
I prefer to keep use stdlib_intrinsics, only: dot_product => stdlib_dot_product With this approach, the user will not inadvertently use the stdlib implementation.
This approach would break backward compatibility with the intrinsics. IMO I prefer the previous approach (either an overlap, or a name with a prefix |
Add intrinsics module containing replacements for intrinsic function where some feature is found interesting: faster implementation, better accuracy, both simultaneously.
This PR follows the discussion in discourse https://fortran-lang.discourse.group/t/lfortran-now-supports-all-intrinsic-functions/8844/41 and it's based on https://github.com/jalvesz/fast_math
stdlib_sum
andstdlib_sum_kahan
)stdlib_dot_product
andstdlib_dot_product_kahan
)cc: @fortran-lang/stdlib @perazz @certik @jvdp1