Improve precision of TP exponentiation functions. #99705

eiriktsarpalis · 2024-03-13T18:28:32Z

Updating the implementations to use FMA where applicable.

NB tests might require some calibration pending test results from platforms without FMA support.

dotnet-policy-service · 2024-03-13T18:29:04Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

stephentoub · 2024-03-13T18:35:39Z

Are there corresponding changes to be made to VectorMath in corelib?

eiriktsarpalis · 2024-03-13T18:36:55Z

Are there corresponding changes to be made to VectorMath in corelib?

Not sure. Do these routines have precision issues as well?

stephentoub · 2024-03-13T18:42:50Z

Are there corresponding changes to be made to VectorMath in corelib?

Not sure. Do these routines have precision issues as well?

It's a copy of the same code, e.g.

runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/VectorMath.cs

Line 11 in 4cf19ee

    
           public static TVectorDouble ExpDouble<TVectorDouble, TVectorInt64, TVectorUInt64>(TVectorDouble x)

. The code being changed here for Exp for example only applies to .NET 8; on .NET 9 that code from Corelib ends up being used.

eiriktsarpalis · 2024-03-13T18:52:06Z

I was only running the .NET 9 tests when working on this, so clearly something's off here.

tannergooding · 2024-03-13T18:57:01Z

...ibraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Exp.cs

+                Vector128<double> a0 = MultiplyAddEstimateOperator<double>.Invoke(Vector128.Create(C12), r, Vector128.Create(C11));
+                Vector128<double> a1 = MultiplyAddEstimateOperator<double>.Invoke(Vector128.Create(C10), r, Vector128.Create(C9));
+                Vector128<double> a2 = MultiplyAddEstimateOperator<double>.Invoke(Vector128.Create(C8), r, Vector128.Create(C7));
+                Vector128<double> a3 = MultiplyAddEstimateOperator<double>.Invoke(Vector128.Create(C6), r, Vector128.Create(C5));
+                Vector128<double> a4 = MultiplyAddEstimateOperator<double>.Invoke(Vector128.Create(C4), r, Vector128.Create(C3));
+                Vector128<double> a5 = MultiplyAddEstimateOperator<double>.Invoke(a0, r2, a1);
+                Vector128<double> a6 = MultiplyAddEstimateOperator<double>.Invoke(a2, r2, a3);
+                Vector128<double> a7 = MultiplyAddEstimateOperator<double>.Invoke(a4, r2, r + Vector128<double>.One);


nit: This many locals isn't really "good" for the JIT and can hinder some optimizations.

It can also make it a little bit harder to follow the intended computation ordering, which can itself limit other optimizations or reorderings the JIT would be allowed to do.

We could reduce the local count significantly if we broke it up like this, and then its also equivalent in codegen to as if there were no locals at all:

Vector128<double> t1 = MultiplyAddEstimate( MultiplyAddEstimate(Vector128.Create(C12), r, Vector128.Create(C11)), r2, MultiplyAddEstimate(Vector128.Create(C10), r, Vector128.Create(C9)) ); Vector128<double> t2 = MultiplyAddEstimate( MultiplyAddEstimate(Vector128.Create(C8), r, Vector128.Create(C7)), r2, MultiplyAddEstimate(Vector128.Create(C6), r, Vector128.Create(C5)) ); t1 = MultiplyAddEstimate(t1, r8, t2); t2 = MultiplyAddEstimate(Vector128.Create(C4), r, Vector128.Create(C3)); t2 = MultiplyAddEstimate(t2, r2, r + Vector128<double>.One); MultiplyAddEstimate(t1, r4, t2);

You can see it results in smaller codegen, lets the initial multiplies be parallel dispatched up front, keeps the total register usage lower, etc: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABABgAJiBGAOgCUBXAOwwEt8YaBJFqVp3KzC4A3AFgAUGUq1GLdpx4Y+AobhoANABxJxUgMyUATOQDC5AN6TyN8gAc+AN2wYY5SAIzkAJhAbAAGzdTQwBeclIaAFZSWLj4uKotPVt7Jxc3D1wvX38gsxRycMiqJDLyivKqAHYU2wdWZ1d3CE8fP0DgqKKImlIUUsrylABOI2ra63r05qycjvzTJB7I2K19Dc3NqKix6qM6mwamzNbs9rzg6pW+0ip9LUenkaiqIyMUfsO0xoyWttynTMWhuiRGWgGRiQIxQ+ihE1GUW+xz+cwuQNMIx6RhoKC0dzxSCiRnu1RQeP0AFE4KQkVMjjNTgCFsEqBRwjjqjtqkh3lE4VFYaQjFSabp6T8Tv9zoDFlQqNiaDy4UYRnDwUYYi8UNTSJMJKkUbMzvNLmY3orRjD9P19FQXlootUqLrkhKJcRDAA1GBgDDQN5aAA8spgAD5yABZAAUPr9AaMwdDEagAEpLBLUnH/VBAyGWSmTOEoOQAFTkKDfLO+nN55MVwrFkzlqAHTO2bMJpMFisg4uFFsoKsdmtd/OXCPYdlRhgBNh2AIATwAgt5vJTsuwMrHR7nEzRTLBt6Y3qm0BXz5291oD0fXNGT1RU6nhzYr3We9gFeFI7P50vV3XTd8G3d990PGBjzZM8L3IMCbwg48RmfV84N3D8J3IbAixnOdWAXFc1w3NgQPveDb0g+9TC0GCoEvdDwLvGAH2qFD2zfBju0w7Awlw/9CKAkjQM4ijjyQWj6Pja9RKoqI2INVIRykjDOknRs+PwgCiOA4TlMYyjmNMFAJLQvSEKYh99HkxSlNrRNx1UrDuh/P9NIE4it3vKdz1bc8vxfdjTLsrjHOwZYXLwgjAI80jmOwnyjD8qzUPghygkna4Iv46KdK89AK0SityAAaiCsdkxoAB5JgYAChSbMCgB6Rq7AgJcemwboWxBUqwrLBsSqw/UbI4sy0vDew2sXHpf0irTBM8uKkB8/LsFY1DiGuWbsu0oSvKiHytHPVqlzq1IAF93XqyhvU48aIwAOR3Mb6zTDNrurF6e1bHoSxbFK7te9SfpbNsPts8rvr7AbBwBr7MIwb8NKi3bFujQLUm2tycr25jyMQqjTx8yTgpkwz5WfNAMdsXzqZsLGUYW2LntJgnydIEz8YszFn0CurAtS+sMBwhn5pi7c6dF9zcrxkS2YfGjz1grmDJYym6dp8HFKlnG0ZVsSYOVuXubk1M+b0QLEZm1zGfF+9EcO89hbOkbhetubpdxlmuzJh9jKVuiyuk+WQms12RZtsWZejYWEp8wbBYLKqav5rXKC2yPPbRh2GydowXZsS7rolI03ETzCddR5ny8coIADMMBJyHML4ABzAALRug5U9KsLXGAmG8M3rqsNPWDr8howAMRA7hcAAZQYOxWqgVwh8C0eRsUzbyBn7AaEr7xo3rru287vz+8HgvFKLrfAp34+YAb/qz4wdNesv7xviL86gA=

Improve precision of TP exponentiation functions.

9306915

dotnet-issue-labeler bot added the area-System.Numerics label Mar 13, 2024

eiriktsarpalis requested review from stephentoub and tannergooding March 13, 2024 18:28

eiriktsarpalis self-assigned this Mar 13, 2024

eiriktsarpalis added this to the 9.0.0 milestone Mar 13, 2024

eiriktsarpalis closed this Mar 13, 2024

eiriktsarpalis mentioned this pull request Mar 13, 2024

Revise test tolerances for exp functions. #99708

Merged

tannergooding reviewed Mar 13, 2024

View reviewed changes

tannergooding approved these changes Mar 13, 2024

View reviewed changes

build-analysis bot mentioned this pull request Mar 13, 2024

WasmTestOnBrowser-System.Runtime.InteropServices.JavaScript.Tests time out #97112

Closed

github-actions bot locked and limited conversation to collaborators Apr 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve precision of TP exponentiation functions. #99705

Improve precision of TP exponentiation functions. #99705

eiriktsarpalis commented Mar 13, 2024 •

edited

Loading

dotnet-policy-service bot commented Mar 13, 2024

stephentoub commented Mar 13, 2024

eiriktsarpalis commented Mar 13, 2024

stephentoub commented Mar 13, 2024

eiriktsarpalis commented Mar 13, 2024

tannergooding Mar 13, 2024

Improve precision of TP exponentiation functions. #99705

Improve precision of TP exponentiation functions. #99705

Conversation

eiriktsarpalis commented Mar 13, 2024 • edited Loading

dotnet-policy-service bot commented Mar 13, 2024

stephentoub commented Mar 13, 2024

eiriktsarpalis commented Mar 13, 2024

stephentoub commented Mar 13, 2024

eiriktsarpalis commented Mar 13, 2024

tannergooding Mar 13, 2024

Choose a reason for hiding this comment

eiriktsarpalis commented Mar 13, 2024 •

edited

Loading