-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kokkos::Controls and cusparse merge algorithm, see issue #670 #725
Conversation
Something bugs out in the cmake logic. Could be a Lassen issue too.
This allows the cusparse specialization layer to use either the default or the merge algorithm depending on input parameters from the user.
This test needs to be guarded as the SpMV merge algorithm is only available in cuSPARSE, not natively in Kokkos-Kernels at this point.
Fixing a name clashing in sptrsv_cuSPARSE: error_t which is usually used as a type not a variable. Adding isParamter method to Controls. Listing more parameters as controled parameters in SpMV
For some reason the issue in the title does not provide a link... so see issue #670 |
kokkos-dev: spot-check####################################################### kokkos-dev: spot-check-tpls####################################################### so close but need to clean-up something in the tpl handles management... let's put a band-aid on it for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucbv I just had a couple questions, and the duplicated copyright headers in one file. Other than that, everything looks good.
if(std::is_same<value_type, float>::value) {myCudaDataType = CUDA_R_32F;} | ||
if(std::is_same<value_type, double>::value) {myCudaDataType = CUDA_R_64F;} | ||
if(std::is_same<value_type, float>::value) {myCudaDataType = CUDA_R_32F;} | ||
if(std::is_same<value_type, double>::value) {myCudaDataType = CUDA_R_64F;} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why CUDA_C_32F
and CUDA_C_64F
aren't supported here? It's ok if those use the fallback SPMV for now, but if it will be done later, we should make a note/issue for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I answered that in your comment below.
KOKKOSSPARSE_SPMV_CUSPARSE(float, int, int, Kokkos::LayoutRight, true) | ||
// KOKKOSSPARSE_SPMV_CUSPARSE(float, int, int, Kokkos::LayoutRight, false) | ||
#if defined(CUSPARSE_VERSION) && (10300 <= CUSPARSE_VERSION) | ||
KOKKOSSPARSE_SPMV_CUSPARSE(double, int64_t, int, Kokkos::LayoutLeft, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if we added the complex cudaDataTypes above, we could instantiate all of these for complex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should, I have not had time to look into complex yet though.
If in your experience it is easy to just instantiate and have it work OK through cusparse then I can do it. If it is an unknown I would suggest holding-off on it and do it in a later PR.
// y is the quantity being tested here, | ||
// so let us use y_value_type to determine | ||
// the appropriate tolerance precision. | ||
const y_value_mag_type eps = std::is_same<y_value_mag_type, float>::value ? 2*1e-3 : 1e-7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change fix the SPMV unit tests for float (#715)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is a new test so it won't fix the old one, this should look pretty much the same as the old one though which means that we might have to do something about it in the future, although this test uses cuSPARSE so that algo might not have the same issues as native kokkos-kenrels
spot check cleared on kokkos-dev: spot-check####################################################### spot-check-tpls####################################################### |
I do not have access to kokkos-dev2 yet so if someone feels like doing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With runtime control parameters, do you have a rough idea of how much overhead is added for default algorihtm ? I do not mean the time increased by the map indexing(this should be trivial). I mean unexpected time increase which might be due to binery size increase and so on.
Is unordered map better than map ? When the directory size is small (like the control object), the indexing time is hardly different and the std::unordered_map could have a larger default hash map size (I do not know what is default hash map size).
@lucbv I'm running spot check on kokkos-dev2 now, and I'll get on the Tpetra integration this week. |
@lucbv Everything's clean, except for one build with a broken configuration:
Here's what the failing one printed:
So maybe something broke in the module loading for that build. Anyway, that's not a problem with these changes. |
@ndellingwood Do you know what |
@brian-kelley No, I'm not sure what that is, something messed up with the module it seems. I just logged onto kokkos-dev-2 and tried to compile a simple cuda driver code with cuda_device.cu [ndellin@kokkos-dev-2 CudaDevice]$ cat cuda_device.cu
#include <cstdio>
// Load modules
// module load sems-cuda/9.2
// Compilation line:
// nvcc cuda_device.cu
// Run code:
// ./a.out
// Expected output:
// Device names and ids...
// See https://devblogs.nvidia.com/how-query-device-properties-and-handle-errors-cuda-cc/
int main() {
int nDevices;
cudaGetDeviceCount(&nDevices);
for (int i = 0; i < nDevices; i++) {
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, i);
printf("Device Number: %d\n", i);
printf(" Device name: %s\n", prop.name);
}
} Load module module load sems-cuda/9.2 Compilation attempt: [ndellin@kokkos-dev-2 CudaDevice]$ nvcc cuda_device.cu
sh: cicc: command not found |
@brian-kelley swapping 9.2 for 10.1 and I had no issues compiling as expected since the 10.1 tests you ran passed. I'll test my small driver code on another system with the |
@brian-kelley the driver compiled fine on kokkos-dev with |
@brian-kelley @ndellingwood |
@lucbv I reran spot checks on kokkos-dev2 since the CUDA 9 seems OK now:
Failures were all #715 . TPLs spot check:
I didn't run this one with float. The only failure was actually gauss_seidel timing out on OpenMP (that will be fixed soon). Since this PR looks good in testing, is there anything else blocking it? |
@brian-kelley @kyungjoo-kim @srajama1
Brief content
This PR implements two new features:
Controls
class that contains an UnorderedMap and handles for various backends.Customers
The
Controls
feature is an enhancement for no particular customer.The merge algorithm path is implemented to support Aria work by @vbrunini
Details on merge algorithm backend
The main work here is to add some logic to choose the right
cusparseSpMVAlg_t
argument for the SpMV routine. This is now decided based on thealgorithm
parameter passed tocontrols
. Additional work has been performed to expand the number of type combinations in ETI, it is now possible to choose any combinations of:Details on Controls
So far this is not very fleshed out, the new class contains a
std::unordered_map<string, string>
to store kernels parameters and it also stores backend handles for cusparse and cublas.A new interface in
KokkosSparse::spmv
is implemented to accept a controls object that drives the algorithmic and performance parameters of the kernel. It also provides the backend handle, thus avoiding to create and destroy it within each call.