Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement packed format for symmetric and triangular matrices #1025

Open
emmyb-bu opened this issue Sep 22, 2023 · 4 comments
Open

Implement packed format for symmetric and triangular matrices #1025

emmyb-bu opened this issue Sep 22, 2023 · 4 comments
Labels
performance Must go faster

Comments

@emmyb-bu
Copy link

emmyb-bu commented Sep 22, 2023

When working with symmetric, upper-triangular, and lower-triangular matrices in LinearAlgebra, it is necessary to allocate $N^2$ numbers, but there are only at most $N(N+1)/2$ distinct entries. This is in contrast to common Fortran linear algebra packages.

Consider the following code:

using LinearAlgebra;
N = 5;
x = zeros(N,N);
z = 0.;
for j in 1:N
   for i in 1:j
      z+=1;
      x[i,j] = z;
   end
end

xut = UpperTriangular(x);
xsym = Symmetric(x);
x === xut.data # returns true
x === xysm.data # returns true

For reference, x looks like:

5×5 Matrix{Float64}:
 1.0  2.0  4.0   7.0  11.0
 0.0  3.0  5.0   8.0  12.0
 0.0  0.0  6.0   9.0  13.0
 0.0  0.0  0.0  10.0  14.0
 0.0  0.0  0.0   0.0  15.0

One can see that UpperTriangular and Symmetric are just wrapping the memory allocated in x which means that if you don't actually care about the lower-triangular part of x, you're unnecessarily storing $N(N-1)/2 = O(N^2/2)$ zeroes! The purpose of these wrappers is so that when some routine from LinearAlgebra is called, the appropriate method optimized for the matrix type is called. In BLAS, LAPACK, and CuBLAS, there are equivalent routines which support a "packed" format which essentially stores the non-zero/necessary entries as a vector in column-major-like form. In this packed format, xut.data would be stored as [1.0, 2.0, 3.0, ..., 15.0]. Here are some links to the documentation where some of these functions and features are discussed:

I think exposing these routines in LinearAlgebra and CUDA.jl would be useful and improve memory efficiency by a factor of two, approximately. I imagine there might be some new types, like LowerTriangularPacked, UpperTriangularPacked, SymmetricPacked, whose data fields would be the packed format used in these linear algebra routines. Some implementations that might be nice besides the access to the BLAS, LAPACK, and CuBlas routines could be:

  • A constructor that takes a Matrix-typed variable like x and converts it to a packed-type variable by extracting and flattening the appropriate entries
  • A constructor that directly takes a Vector-typed variable in a packed format
  • Appropriate indexing: provide a row-column index like xut[i,j] and access the appropriate linear index in the packed format
  • Iterate over relevant indices in an efficient way
  • A method to convert these packed types back to dense matrices

I would greatly appreciate it if this were implemented, and I believe it would improve Julia as a whole. Thanks!

@jishnub
Copy link
Collaborator

jishnub commented Sep 23, 2023

Does https://github.com/JuliaLinearAlgebra/RectangularFullPacked.jl implement what you want?

@brenhinkeller brenhinkeller added the performance Must go faster label Sep 24, 2023
@dkarrasch
Copy link
Member

I believe @bjarthur was working on vector-based packed format linear algebra; I found https://github.com/JaneliaSciComp/SymmetricFormats.jl. I remember that we do have some LAPACK functions available in LinearAlgebra (#34211, JuliaLang/julia#34320), but we don't have wrapper types.

@bjarthur
Copy link
Contributor

i was indeed, and yes you found it! please file an issue there if you have problems or suggestions

@emmyb-bu
Copy link
Author

This is nice. Thank you so much for sharing. I initially became interested in this because I am memory-limited on a GPU for a project, so ideally this would work on the GPU with CUDA.jl. I also would enjoy seeing this implemented in the default LinearAlgebra with appropriate method and in a packed format with pretty-printing, but I understand that this might not happen.

Thanks again!

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants