Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, the
MulAddMul
is implemented as a callable structure for calculating a short-circuit version ofx * a + y * b
depending on whether a is 1 or b is 0. The short-circuit is implemented with "value types" on a pair of Booleans (ais1, bis0). This may lead to a performance regression on master, possibly due to constant propagation, details at JuliaLang/LinearAlgebra.jl#684This is an alternative way to fix JuliaLang/LinearAlgebra.jl#684 by implementing the short-circuit callable structure
MulAddMul
differently, where the short-circuiting is done in the function body rather than depending on the dispatch system for the value type (ais1, bis0). The 3-argumentmul!
here might be a bit slower compared with 1.3.1 (~7-10 ns on my testings), not sure how to improve, but this might be less brittle than constant propagation?test script:
on 1.3.1:
Julia Version 1.3.1
Commit 2d57411* (2019-12-30 21:36 UTC)
on master:
Julia Version 1.5.0-DEV.71
Commit 15d693b (2020-01-15 18:13 UTC)
this PR:
Julia Version 1.5.0-DEV.72
Commit 20f2720 (2020-01-15 20:16 UTC)