Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intrinsics module with alternative implementations #915

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

jalvesz
Copy link
Contributor

@jalvesz jalvesz commented Jan 3, 2025

Add intrinsics module containing replacements for intrinsic function where some feature is found interesting: faster implementation, better accuracy, both simultaneously.

This PR follows the discussion in discourse https://fortran-lang.discourse.group/t/lfortran-now-supports-all-intrinsic-functions/8844/41 and it's based on https://github.com/jalvesz/fast_math

  • sum: 2 options (stdlib_sum and stdlib_sum_kahan)
  • dot_product: 2 options (stdlib_dot_product and stdlib_dot_product_kahan)

cc: @fortran-lang/stdlib @perazz @certik @jvdp1

@jalvesz jalvesz changed the title feate: intrinsics module with alternative implementations intrinsics module with alternative implementations Jan 4, 2025
@jalvesz
Copy link
Contributor Author

jalvesz commented Jan 7, 2025

One philosophical question: should the fsum interface be renamed to sum to enable direct replacement of the intrinsic? Keep this name? Or yet something like stdlib_sum? (Same for fprod->dot_product)

Regarding the kahan versions, given that the accuracy gains are close between the pure chunked version and the kahan one, I'm wondering which level of support should be enabled to switch between them?

@jalvesz jalvesz marked this pull request as ready for review January 12, 2025 10:32
@perazz
Copy link
Member

perazz commented Jan 30, 2025

IMHO shorter names are better, and don't see a problem if they overlap with the intrinsics. First, because one can always pick the right version:

use stdlib_intrinsics, only: dot_product

vs.

! Force using intrinsic
intrinsic :: dot_product

And then because they can be augmented by more/different arguments

c = dot_product(a,b) ! intrinsic
c = dot_product(a,b,mode='kahan') ! stdlib
c = dot_product(a,b,mode='blocked') ! stdlib
...

I find this more elegant and definitely not confusing.
This PR also reminds me that it would be worthwhile to also augment the matmul intrinsic via calls to the gemm backend

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jalvesz. LGTM. It seems to be close to be ready for mergin.


#### Description

The `stdlib_sum` function can replace the intrinsic `sum` for `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential as well as reducing the round-off error. This procedure is recommended when summing large arrays, for repetitive summation of smaller arrays consider the classical `sum`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not for integer?


#### Description

The `stdlib_dot_product_kahan` function can replace the intrinsic `dot_product` for 1D `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential , complemented by the same `elemental` kernel based on the [kahan summation](https://en.wikipedia.org/wiki/Kahan_summation_algorithm) used for `stdlib_sum` to reduce the round-off error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the license of wikipedia in agreement with the MIT license of stdlib?


## Introduction

The `stdlib_intrinsics` module provides replacements for some of the well known intrinsic functions found in Fortran compilers for which either a faster and/or more accurate implementation is found which has also proven of interest to the Fortran community.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why are these functions not implemented by the compilers if they are faster and more accurate? Is it a standard limitation?

For the cases where it is less accurate, I think there should be a warning in the specs.

#:set RANKS = range(2, MAXRANK + 1)

module stdlib_intrinsics
!!Replacement for certain Fortran intrinsic functions offering either faster and/or more accurate implementations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
!!Replacement for certain Fortran intrinsic functions offering either faster and/or more accurate implementations.
!!Replacement of some Fortran intrinsic functions offering either faster and/or more accurate implementations.

!! This interface provides standard conforming call for sum of elements of any rank.
!! The 1-D base implementation follows a chunked approach for optimizing performance and increasing accuracy.
!! The `N-D` interfaces calls upon the `(N-1)-D` implementation.
!! Supported data types include `real` and `complex`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are integers not supported?

x(i) = 8*atan(1._${k1}$)*(real(i,kind=${k1}$)-0.5_${k1}$)/real(n,kind=${k1}$)**2
end do
allocate(mask(n),source=.false.); mask(1:n:2) = .true.
allocate(nmask(n)); nmask = .not.mask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allocate(nmask(n)); nmask = .not.mask
allocate(nmask, source = .not.mask)

end do

allocate(mask(n),source=.false.); mask(1:n:2) = .true.
allocate(nmask(n)); nmask = .not.mask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allocate(nmask(n)); nmask = .not.mask
allocate(nmask, source = .not.mask)

@jvdp1
Copy link
Member

jvdp1 commented Feb 3, 2025

IMHO shorter names are better, and don't see a problem if they overlap with the intrinsics. First, because one can always pick the right version:

use stdlib_intrinsics, only: dot_product

vs.

! Force using intrinsic
intrinsic :: dot_product

I prefer to keep stdlib_sum and stdlib_dot_product as it is current in this PR. This will allow the user to use both implementations, and to use stdlib implementation if desired as followed:

use stdlib_intrinsics, only: dot_product => stdlib_dot_product

With this approach, the user will not inadvertently use the stdlib implementation.

And then because they can be augmented by more/different arguments

c = dot_product(a,b) ! intrinsic
c = dot_product(a,b,mode='kahan') ! stdlib
c = dot_product(a,b,mode='blocked') ! stdlib
...

This approach would break backward compatibility with the intrinsics. IMO I prefer the previous approach (either an overlap, or a name with a prefix stdlib_).

@perazz
Copy link
Member

perazz commented Feb 4, 2025

I prefer to keep stdlib_sum and stdlib_dot_product

LGTM @jvdp1 @jalvesz!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants