Releases: eth-cscs/DLA-Future
Releases · eth-cscs/DLA-Future
DLA-Future 0.7.3
DLA-Future 0.7.1
DLA-Future 0.7.0
Changes
- Added (generalized) eigensolver which computes only a part of the eigenspectrum. (#1194)
Norm
is now fully asynchronous. (#1221)
Performance improvements
- Refactored communication to use pika's
transform_mpi
and polling support. (#1125) - Use custom coalescing heuristic for memory pools. (#1183)
- Added configuration option for number of CUDA streams and cuBLAS/SOLVER handles. (#1222, #1182)
- Some algorithmic clean-ups and improvements. (#1213, #1219, #1232)
Bug fixes
DLA-Future 0.6.0
Changes
- Renamed ScaLAPACK-like generalized eigensolvers
pXsygvx
/pXhegvx
topXsygvd
/pXhegvd
. (#1168) - Introduced generalized eigensolver where the matrix B is already factorized. (#1167)
Performance improvements
- Local eigenvector permutations in the distributed tridiagonal eigensolver are executed directly in GPU memory. (#1118)
Bug fixes
- Fixed ScaLAPACK detection in CMake for specific uenv cases. (#1159)
DLA-Future 0.5.0
Changes
- Introduced an option (*) for forcing contiguous GPU communication buffers. (#1096)
- Introduced an option (*) for enabling GPU aware MPI communication. (#1102)
- Removed special handling of Intel MKL, as it could lead to broken installations. (#1149)
- Spack installations: spack will set the correct variables.
- Manual installations: the user is responsible to correctly set variables (see BUILD.md).
(*) These options are available as spack variants.
Performance improvements
- Don't communicate in algorithms when using single rank communicators. (#1097)
- Fixed slow performance of local version of
bt_band_to_tridiagonal
(#1144)
Bug fixes
DLA-Future 0.4.1
Bug fixes
- Update project version and export it in CMake. (#1121)
DLA-Future 0.4.0
Changes:
- Modified
CommunicatorGrid
to avoid blocking calls toMPI_Comm_dup
. It now returns communicator pipelines. (#993) - Added support for Intel oneMKL and the
intel-oneapi-mkl
spack package. (#1073) (*)
Performance improvements:
- Reduced the size of the matrix-matrix multiplications in the tridiagonal eigensolver to cover only the non deflated part of the eigenvectors. (#951 #967 #996 #997 #998)
- Introduced stackless threads where appropriate. (#1037)
Bug fixes:
- Use
drop_operation_state
to avoid stack overflows. (#1004)
Notes:
(*) At the time of the release the spack spec blaspp~openmp ^intel-oneapi-mkl threads=openmp
doesn't build. If you rely on multithreaded BLAS we suggest to use blaspp+openmp ^intel-oneapi-mkl threads=openmp
until spack/spack#42087 gets merged.
DLA-Future 0.3.1
Bugfix:
- Fixed compilation with gcc 9.3
- Fixed compilation with CUDA 11.2
- Improved eigensolver tests
DLA-Future 0.3.0
DLA-Future 0.2.1
Bugfix:
- Fixed a problem in
reduction_to_band
that could have produced results filled with NaNs for certain corner cases. (E.g. input matrix with all off-band elements set to 0).