Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P1673: Corrections #464

Closed
mhoemmen opened this issue Jul 16, 2024 · 8 comments
Closed

P1673: Corrections #464

mhoemmen opened this issue Jul 16, 2024 · 8 comments

Comments

@mhoemmen
Copy link
Contributor

mhoemmen commented Jul 16, 2024

P1673: Corrections

BLAS 3

[linalg.algs.blas3.rankk]

C is the symmetric or Hermitian matrix, not A, so A doesn't have to be square. (C does need to be square, though.)

  • Replace (2.2) with "compatible-static-extents<decltype(C), decltype(C)>(0, 1) is true"

  • Replace (2.3) and (2.4) with possibly-multipliable<decltype(A), decltype(transposed(A)), C>

  • Replace (3.1) with "C.extent(0) equals C.extent(1)"

  • Replace (3.2) and (3.3) with "multipliable(A, transposed(A), C)"

[linalg.algs.blas3.rank2k]

C is the symmetric or Hermitian matrix, not A or B, so A (or B) doesn't have to be square. (C does need to be square, though.) A, B, and C must have the same number of rows. A and B must have the same number of columns (they must both be N x K for some K not necessarily equal to N).

  • Replace (2.2) with "possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)> is true and possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)> is true"

  • Replace (2.3) with "compatible-static-extents<decltype(C), decltype(C)>(0, 1) is true"

  • Replace (3.1) with "multipliable(A, transposed(B), C) is true and multipliable(transposed(B), A, C) is true"

  • Replace (3.2) with "C.extent(0) equals C.extent(1)"

@dalg24
Copy link
Member

dalg24 commented Jul 16, 2024

We should probably make that correction against the current draft instead of P1673.
https://eel.is/c++draft/linalg#algs.blas3.rankk
The numbers are off.

@mhoemmen
Copy link
Contributor Author

@dalg24 kokkos/stdBLAS#274 appears to look at the current draft. I'll revise my comments there.

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Jul 18, 2024

P1673: Corrections (updated)

BLAS 2

[linalg.algs.blas2.gemv]

  • In 3.2, replace "possibly-addable<decltype(x), decltype(y), decltype(z)>" with "possibly-addable<decltype(y), decltype(y), decltype(z)>"

  • In 4.2, replace "addable(x, y, z)" with "addable(y, y, z)"

  • In 5, replace "$O($ x.extent(0) × A.extent(1) $)$" with "$O($ A.extent(0) × x.extent(0) $)$"

[linalg.algs.blas2.symv]

  • In 3.4, replace "possibly-addable<decltype(x), decltype(y), decltype(z)>" with "possibly-addable<decltype(y), decltype(y), decltype(z)>"

  • In 4.3, replace "addable(x, y, z)" with "addable(y, y, z)"

  • In 5, replace "$O($ x.extent(0) × A.extent(1) $)$" with "$O($ A.extent(0) × x.extent(0) $)$" (the original is technically correct, since A is symmetric and therefore square, but changing this would improve consistency with [linalg.algs.blas2.gemv])

[linalg.algs.blas2.hemv]

  • In 3.4, replace "possibly-addable<decltype(x), decltype(y), decltype(z)>" with "possibly-addable<decltype(y), decltype(y), decltype(z)>"

  • In 4.3, replace "addable(x, y, z)" with "addable(y, y, z)"

  • In 5, replace "$O($ x.extent(0) × A.extent(1) $)$" with "$O($ A.extent(0) × x.extent(0) $)$" (the original is technically correct, since A is Hermitian and therefore square, but changing this would improve consistency with [linalg.algs.blas2.gemv])

[linalg.algs.blas2.trmv]

(The extents compatibility conditions are expressed differently than in the above matrix-vector multiply sections, perhaps more for consistency with the TRSV section below. They look correct here.)

  • In 7, replace "$O($ x.extent(0) × A.extent(1) $)$" with "$O($ A.extent(0) × x.extent(0) $)$" (the original is technically correct, since A is square, but changing this would improve consistency with [linalg.algs.blas2.gemv])

  • In 10, replace "$O($ y.extent(0) × A.extent(1) $)$" with "$O($ A.extent(0) × y.extent(0) $)$" (the original is technically correct, since A is square, but changing this would improve consistency with [linalg.algs.blas2.gemv])

  • In 13, replace "$O($ x.extent(0) × A.extent(1) $)$" with "$O($ A.extent(0) × x.extent(0) $)$" (the original is technically correct, since A is square, but changing this would improve consistency with [linalg.algs.blas2.gemv])

Rest of BLAS 2 (is fine)

BLAS 3

OK: [linalg.algs.blas3.xxmm], [linalg.algs.blas3.trmm]

[linalg.algs.blas3.rankk]

(C is the symmetric or Hermitian matrix, not A. C needs to be square, not necessarily A.)

  • Remove 3.2, which currently says "compatible-static-extents<decltype(A), decltype(A)>(0, 1) is true"

  • Remove 4.1, which currently says "A.extent(0) equals A.extent(1)"

  • In the Complexity clause 5, replace "$O( A.extent(0) × A.extent(1) x C.extent(0) $)$" with "$O( C.extent(0) × C.extent(1) x C.extent(0) $)$" (the current clause is correct, but less consistent with GEMM)

[linalg.algs.blas3.rank2k]

(C is the symmetric or Hermitian matrix, not A or B. C needs to be square, not necessarily A or B. A, B, and C must have the same number of rows. A and B must have the same number of columns (they must both be N x K for some K not necessarily equal to N).)

  • In 3.2, replace "possibly-addable<decltype(A), decltype(B), decltype(C)>() is true" with "possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)> is true and possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)> is true"

  • In 3.3, replace "compatible-static-extents<decltype(A), decltype(A)>(0, 1)" with "compatible-static-extents<decltype(C), decltype(C)>(0, 1)"

  • In 4.1, replace "addable(A, B, C) is true" with "multipliable(A, transposed(B), C) is true and multipliable(transposed(B), A, C) is true"

  • In 4.2, replace "A.extent(0) equals A.extent(1)" with "C.extent(0) equals C.extent(1)"

  • In the Complexity clause 5, replace "$O($ A.extent(0) × A.extent(1) × C.extent(0) $)$" with "$O($ A.extent(0) × A.extent(1) × B.extent(1) $)$"

[linalg.algs.blas3.trsm]

(Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible.)

  • In the Complexity clause 6 (for the left solve), replace "$O($ A.extent(0) × X.extent(1) × X.extent(1) $)$" with "$O($ A.extent(0) × B.extent(0) × B.extent(1) $)$"

(Complexity clause 14, for the right solve, is fine.)

[linalg.algs.blas3.inplacetrsm]

  • In the Complexity clause 13 (for the right solve), replace $O($ A.extent(0) × A.extent(1) × B.extent(1) $)$ with "$O($ B.extent(0) × A.extent(0) × A.extent(1) $)$"

@rasolca
Copy link

rasolca commented Jul 19, 2024

@mhoemmen a few comments:

[linalg.algs.blas3.rankk]

...

  • In the Complexity clause 5, replace "$O( A.extent(0) × A.extent(1) x C.extent(0) )" with "$O( C.extent(0) × C.extent(1) x C.extent(0) )" (the current clause is correct, but less consistent with GEMM)

The change you propose is wrong as it converts to O(n^3) if we assume A(n,k), C(n,n). I suppose you mean "O(A.extent(0)×A.extent(1)xA.extent(0))" to be consistent with GEMM.

[linalg.algs.blas3.rank2k]

...

  • In 3.2, replace "possibly-addable<decltype(A), decltype(B), decltype(C)>() is true" with "possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)> is true and possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)> is true"

  • In 4.1, replace "addable(A, B, C) is true" with "multipliable(A, transposed(B), C) is true and multipliable(transposed(B), A, C) is true"

The change proposed for 4.1 is wrong. It should read:
In 4.1, replace "addable(A, B, C) is true" with "multipliable(A, transposed(B), C) is true and multipliable(transposed(B), A, C) multipliable(B, transposed(A), C) is true"

However, the second part of the proposed statements for 3.2 and 4.1 is superfluous as:
multipliable(A, transposed(B), C) && C is a square matrix => multipliable(B, transposed(A), C)

  • In the Complexity clause 5, replace "O(A.extent(0)×A.extent(1)×C.extent(0))" with "O(A.extent(0)×A.extent(1)×B.extent(1))"

This proposed change is wrong as it would convert to O(n k^2) if we assume A(n,k), B(n,k) and C(n,n).
The old statement is correct, however, it can be made consistent with GEMM using
"O(A.extent(0)×A.extent(1)×B.extent(0))"

[linalg.algs.blas3.trsm]

(Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible.)

  • In the Complexity clause 6 (for the left solve), replace "O(A.extent(0)×X.extent(1)×X.extent(1))" with "O(A.extent(0)×B.extent(0)×B.extent(1))"

The suggested change is correct. But note that the original statement is wrong as X is not a square matrix and has the same size as B.

Finally one of the errors has not been reported here:

[linalg.algs.blas3.trmm]

  • In the complexity clause 5, replace O(A.extent(0)×A.extent(1)×C.extent(0)) with O(A.extent(0)×A.extent(1)×C.extent(1))
  • Complexity clause 9 is correct, but can be made more consistent with GEMM replacing O(A.extent(0)×A.extent(1)×C.extent(0)) with O(C.extent(0)×C.extent(1)×A.extent(1))

@mhoemmen
Copy link
Contributor Author

Oh, I made some typos there! Thanks for reviewing!

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Jul 24, 2024

More bugs, found by my colleague Ilya Burylov

layout_transpose

Private members nested-mapping and extents_ should be in mapping and not in the layout.

conjugated_accessor

Constructor from NestedAccessor appears in the wording (para 3) but is missing from the class synopsis.

[linalg.conj.conjugatedaccessor] explicit(!is_convertible_v<OtherNestedAccessor, NestedAccessor>>) has an extra closing >.

The following function should return NestedAccessor instead.

constexpr const Accessor& nested_accessor() const noexcept;

scaled_accessor

The nested-accessor member of scaled_accessor does not have a trailing underscore. This is inconsistent with other classes in [linalg].

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Aug 5, 2024

More bugs, found by my colleague Ilya Burylov

Specify what happens if hermitian_* algorithms get a diagonal element with nonzero imaginary part

Define that for hermitian_* algorithms, the imaginary elements on the diagonal aren't even accessed (which is what the BLAS Standard says). The reference BLAS implementation of ZHEMV uses dble(A(j,j)), which means that it just takes the real part. We should also do that.

  1. Extend [linalg.general] with diagonal access logic. Something like this: "For accesses of diagonal elements m[i, i], F will use the value real-if-needed(m[i, i]) if the name of F starts with hermitian."

  2. "These functions perform an updating Hermitian matrix-vector product, taking into account the Triangle parameter that applies to the Hermitian matrix A [linalg.general]."

@mhoemmen
Copy link
Contributor Author

These issues have been submitted as the following LWG / wording issues. Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants