You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #967, a new "special" permutation has been added. In the end it is just a local permutation, but it starts from reasoning globally. Currently, it runs on MC for both MC and GPU variants of the tridiagonal solver. In order to get it run on GPU, we have two main ways:
In order to re-use the local permutation:
we can "preprocess" the permutation array on Backend::MC extracting just local parts and convert global indices to local indices
Problem: currently the permutation (local) can just deal with local matrices
Option 1: use local indices to access the local part
Option 2: create a new object (e.g. MatrixRef) that just refers to the local part (i.e. the new object does not feel anymore the distribution)
Permutation on GPU: Currently it is implemented passing a "simplified" distribution (pointer + horizontal and vertical distance between tiles)
Since we are going to support "random" placed allocations
(preferred) Option 1: send a vector of pointers, each element is the beginning of a tile
(Option 2: force the layout on the matrix used)
@rasolca does not like how the position is currently computed
It is going to be implemented differently (currently a CUDA thread works on a single element)
cudaMemcpy is not an alternative since it would spawn too many small kernels
The text was updated successfully, but these errors were encountered:
In #967, a new "special" permutation has been added. In the end it is just a local permutation, but it starts from reasoning globally. Currently, it runs on MC for both MC and GPU variants of the tridiagonal solver. In order to get it run on GPU, we have two main ways:
In order to re-use the local permutation:
Permutation on GPU: Currently it is implemented passing a "simplified" distribution (pointer + horizontal and vertical distance between tiles)
The text was updated successfully, but these errors were encountered: