intrinsics module with alternative implementations #915

jalvesz · 2025-01-03T20:54:23Z

Add intrinsics module containing replacements for intrinsic function where some feature is found interesting: faster implementation, better accuracy, both simultaneously.

This PR follows the discussion in discourse https://fortran-lang.discourse.group/t/lfortran-now-supports-all-intrinsic-functions/8844/41 and it's based on https://github.com/jalvesz/fast_math

sum: 2 options (stdlib_sum and stdlib_sum_kahan)
dot_product: 2 options (stdlib_dot_product and stdlib_dot_product_kahan)

cc: @fortran-lang/stdlib @perazz @certik @jvdp1

…ntrinsics

jalvesz · 2025-01-07T19:35:12Z

One philosophical question: should the fsum interface be renamed to sum to enable direct replacement of the intrinsic? Keep this name? Or yet something like stdlib_sum? (Same for fprod->dot_product)

Regarding the kahan versions, given that the accuracy gains are close between the pure chunked version and the kahan one, I'm wondering which level of support should be enabled to switch between them?

…ntrinsics

perazz · 2025-01-30T17:04:39Z

IMHO shorter names are better, and don't see a problem if they overlap with the intrinsics. First, because one can always pick the right version:

use stdlib_intrinsics, only: dot_product

vs.

! Force using intrinsic
intrinsic :: dot_product

And then because they can be augmented by more/different arguments

c = dot_product(a,b) ! intrinsic
c = dot_product(a,b,mode='kahan') ! stdlib
c = dot_product(a,b,mode='blocked') ! stdlib
...

I find this more elegant and definitely not confusing.
This PR also reminds me that it would be worthwhile to also augment the matmul intrinsic via calls to the gemm backend

jvdp1

Thank you @jalvesz. LGTM. It seems to be close to be ready for mergin.

jvdp1 · 2025-02-02T14:25:26Z

doc/specs/stdlib_intrinsics.md

+
+#### Description
+
+The `stdlib_sum` function can replace the intrinsic `sum` for `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential as well as reducing the round-off error. This procedure is recommended when summing large arrays, for repetitive summation of smaller arrays consider the classical `sum`.


Why is it not for integer?

jvdp1 · 2025-02-02T20:28:10Z

doc/specs/stdlib_intrinsics.md

+
+#### Description
+
+The `stdlib_dot_product_kahan` function can replace the intrinsic `dot_product` for 1D `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential , complemented by the same `elemental` kernel based on the [kahan summation](https://en.wikipedia.org/wiki/Kahan_summation_algorithm) used for `stdlib_sum` to reduce the round-off error.


is the license of wikipedia in agreement with the MIT license of stdlib?

jvdp1 · 2025-02-03T20:34:29Z

doc/specs/stdlib_intrinsics.md

+
+## Introduction
+
+The `stdlib_intrinsics` module provides replacements for some of the well known intrinsic functions found in Fortran compilers for which either a faster and/or more accurate implementation is found which has also proven of interest to the Fortran community.


Question: why are these functions not implemented by the compilers if they are faster and more accurate? Is it a standard limitation?

For the cases where it is less accurate, I think there should be a warning in the specs.

jvdp1 · 2025-02-03T20:35:41Z

src/stdlib_intrinsics.fypp

+#:set RANKS = range(2, MAXRANK + 1)
+
+module stdlib_intrinsics
+    !!Replacement for certain Fortran intrinsic functions offering either faster and/or more accurate implementations.


Suggested change

!!Replacement for certain Fortran intrinsic functions offering either faster and/or more accurate implementations.

!!Replacement of some Fortran intrinsic functions offering either faster and/or more accurate implementations.

jvdp1 · 2025-02-03T20:46:36Z

src/stdlib_intrinsics.fypp

+        !! This interface provides standard conforming call for sum of elements of any rank.
+        !! The 1-D base implementation follows a chunked approach for optimizing performance and increasing accuracy.
+        !! The `N-D` interfaces calls upon the `(N-1)-D` implementation. 
+        !! Supported data types include `real` and `complex`.


Why are integers not supported?

jvdp1 · 2025-02-03T20:50:42Z

test/intrinsics/test_intrinsics.fypp

+            x(i) = 8*atan(1._${k1}$)*(real(i,kind=${k1}$)-0.5_${k1}$)/real(n,kind=${k1}$)**2
+        end do
+        allocate(mask(n),source=.false.); mask(1:n:2) = .true.
+        allocate(nmask(n)); nmask = .not.mask


Suggested change

allocate(nmask(n)); nmask = .not.mask

allocate(nmask, source = .not.mask)

jvdp1 · 2025-02-03T20:51:56Z

test/intrinsics/test_intrinsics.fypp

+        end do
+
+        allocate(mask(n),source=.false.); mask(1:n:2) = .true.
+        allocate(nmask(n)); nmask = .not.mask


Suggested change

allocate(nmask(n)); nmask = .not.mask

allocate(nmask, source = .not.mask)

jvdp1 · 2025-02-03T21:00:19Z

IMHO shorter names are better, and don't see a problem if they overlap with the intrinsics. First, because one can always pick the right version:
use stdlib_intrinsics, only: dot_product
vs.
! Force using intrinsic
intrinsic :: dot_product

I prefer to keep stdlib_sum and stdlib_dot_product as it is current in this PR. This will allow the user to use both implementations, and to use stdlib implementation if desired as followed:

use stdlib_intrinsics, only: dot_product => stdlib_dot_product

With this approach, the user will not inadvertently use the stdlib implementation.

And then because they can be augmented by more/different arguments
c = dot_product(a,b) ! intrinsic
c = dot_product(a,b,mode='kahan') ! stdlib
c = dot_product(a,b,mode='blocked') ! stdlib
...

This approach would break backward compatibility with the intrinsics. IMO I prefer the previous approach (either an overlap, or a name with a prefix stdlib_).

perazz · 2025-02-04T07:11:24Z

I prefer to keep stdlib_sum and stdlib_dot_product

LGTM @jvdp1 @jalvesz!

jalvesz and others added 17 commits December 24, 2024 13:12

intrinsics module with fast sums

08ec0aa

Merge branch 'fortran-lang:master' into intrinsics

c36251e

Merge branch 'fortran-lang:master' into intrinsics

2207f41

add fast dot_product and start tests

2bc7af9

Merge branch 'intrinsics' of https://github.com/jalvesz/stdlib into i…

4625205

…ntrinsics

add complex sum test

243ea6f

test masked sum

c38dcd6

add dot_product tests

bf1ce2f

start specs

cc9df61

Merge branch 'fortran-lang:master' into intrinsics

671fd61

split into submodules

75945f1

specs and examples

d05903f

Merge branch 'intrinsics' of https://github.com/jalvesz/stdlib into i…

c0d96e5

…ntrinsics

Merge branch 'fortran-lang:master' into intrinsics

4abd8d3

fix specs

7c6e8a4

fix test: complex initialization

7cea1fd

fix test: complex assignment caused accuracy loss

eaffa4a

jalvesz changed the title ~~feate: intrinsics module with alternative implementations~~ intrinsics module with alternative implementations Jan 4, 2025

jalvesz and others added 6 commits January 5, 2025 16:56

Merge branch 'fortran-lang:master' into intrinsics

ad64162

extend fsum support for ndarrays

a3d24e4

remove unnecessary definition

5a1fdcb

update specs, change name of kahan kernel

47396ac

small reorganization

ecb7050

Merge branch 'intrinsics' of https://github.com/jalvesz/stdlib into i…

87ef502

…ntrinsics

jalvesz added 2 commits January 11, 2025 23:53

change names to stdlib_*

14be974

add comments

aaa68bc

jalvesz marked this pull request as ready for review January 12, 2025 10:32

jalvesz and others added 2 commits January 17, 2025 17:26

Merge branch 'fortran-lang:master' into intrinsics

cc232e1

extend kahan sum for rank N arrays

6e36b6f

jalvesz and others added 3 commits January 17, 2025 19:56

Merge branch 'intrinsics' of https://github.com/jalvesz/stdlib into i…

65175d7

…ntrinsics

Merge branch 'fortran-lang:master' into intrinsics

8a35f38

Merge branch 'fortran-lang:master' into intrinsics

16a0e96

jvdp1 reviewed Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intrinsics module with alternative implementations #915

intrinsics module with alternative implementations #915

jalvesz commented Jan 3, 2025 •

edited

Loading

jalvesz commented Jan 7, 2025

perazz commented Jan 30, 2025 •

edited

Loading

jvdp1 left a comment

jvdp1 Feb 2, 2025

jvdp1 Feb 2, 2025

jvdp1 Feb 3, 2025

jvdp1 Feb 3, 2025

jvdp1 Feb 3, 2025

jvdp1 Feb 3, 2025

jvdp1 Feb 3, 2025

jvdp1 commented Feb 3, 2025

perazz commented Feb 4, 2025


		#### Description

		The `stdlib_sum` function can replace the intrinsic `sum` for `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential as well as reducing the round-off error. This procedure is recommended when summing large arrays, for repetitive summation of smaller arrays consider the classical `sum`.


		#### Description

		The `stdlib_dot_product_kahan` function can replace the intrinsic `dot_product` for 1D `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential , complemented by the same `elemental` kernel based on the [kahan summation](https://en.wikipedia.org/wiki/Kahan_summation_algorithm) used for `stdlib_sum` to reduce the round-off error.


		## Introduction

		The `stdlib_intrinsics` module provides replacements for some of the well known intrinsic functions found in Fortran compilers for which either a faster and/or more accurate implementation is found which has also proven of interest to the Fortran community.

	!!Replacement for certain Fortran intrinsic functions offering either faster and/or more accurate implementations.
	!!Replacement of some Fortran intrinsic functions offering either faster and/or more accurate implementations.

	allocate(nmask(n)); nmask = .not.mask
	allocate(nmask, source = .not.mask)

intrinsics module with alternative implementations #915

Are you sure you want to change the base?

intrinsics module with alternative implementations #915

Conversation

jalvesz commented Jan 3, 2025 • edited Loading

jalvesz commented Jan 7, 2025

perazz commented Jan 30, 2025 • edited Loading

jvdp1 left a comment

Choose a reason for hiding this comment

jvdp1 Feb 2, 2025

Choose a reason for hiding this comment

jvdp1 Feb 2, 2025

Choose a reason for hiding this comment

jvdp1 Feb 3, 2025

Choose a reason for hiding this comment

jvdp1 Feb 3, 2025

Choose a reason for hiding this comment

jvdp1 Feb 3, 2025

Choose a reason for hiding this comment

jvdp1 Feb 3, 2025

Choose a reason for hiding this comment

jvdp1 Feb 3, 2025

Choose a reason for hiding this comment

jvdp1 commented Feb 3, 2025

perazz commented Feb 4, 2025

jalvesz commented Jan 3, 2025 •

edited

Loading

perazz commented Jan 30, 2025 •

edited

Loading