You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If total size for P (input) and Q (output) windows exceeds --maxSharedMemory, we should split P window, see #204.
If it doesn't help (i.e. size of Q exceeds --maxSharedMemory), we have to also split Q (choosing a minimal split factor to reduce performance overhead) and calculate subblocks of Q = P^T P independently.
The text was updated successfully, but these errors were encountered:
…e --maxSharedMemory limit
TODO: update also bigint_syrk/Readme.md
Changed two end-to-end tests: set low --maxSharedMemory to enforce Q window splitting
In unit tests, we set different shared memory limits - to calculate Q = P^T P without splitting, with splitting only P window, or with splitting both P and Q.
Also supported both uplo=UPPER and LOWER for syrk
…e --maxSharedMemory limit
TODO: update also bigint_syrk/Readme.md
Changed two end-to-end tests: set low --maxSharedMemory to enforce Q window splitting
In unit tests, we set different shared memory limits - to calculate Q = P^T P without splitting, with splitting only P window, or with splitting both P and Q.
Also supported both uplo=UPPER and LOWER for syrk
Added new parameter for reduce_scatter() to cover all cases.
Fix#205 bigint-syrk-blas: split Q window, if it does not fit into the --maxSharedMemory limit
TODO: update also bigint_syrk/Readme.md
Changed two end-to-end tests: set low --maxSharedMemory to enforce Q window splitting
In unit tests, we set different shared memory limits - to calculate Q = P^T P without splitting, with splitting only P window, or with splitting both P and Q.
Also supported both uplo=UPPER and LOWER for syrk.
Fixed reduce_scatter(): Old version synchronized only upper half always, but for off-diagonal blocks Q_IJ, we need to synchronize all.
See details in parent issue #207
If total size for P (input) and Q (output) windows exceeds
--maxSharedMemory
, we should split P window, see #204.If it doesn't help (i.e. size of Q exceeds
--maxSharedMemory
), we have to also split Q (choosing a minimal split factor to reduce performance overhead) and calculate subblocks ofQ = P^T P
independently.The text was updated successfully, but these errors were encountered: