Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CUDAEnsemble device selection error messages #858

Closed
ptheywood opened this issue May 19, 2022 · 0 comments · Fixed by #864
Closed

Improve CUDAEnsemble device selection error messages #858

ptheywood opened this issue May 19, 2022 · 0 comments · Fixed by #864

Comments

@ptheywood
Copy link
Member

CUDAEnsemble does not provide useful error messages when no CUDA devices are present or the CUDA runtime cannot be created, instead just throwing the error to the use via gpuErrchk

gpuErrchk(cudaGetDeviceCount(&ct));

This should be improved to match the more complex CUDASimulation device selection logic that throws exceptions with useful messages.

cudaStatus = cudaGetDeviceCount(&device_count);
if (cudaStatus != cudaSuccess) {
THROW exception::InvalidCUDAdevice("Error finding CUDA devices! Do you have a CUDA-capable GPU installed?");
}
if (device_count == 0) {
THROW exception::InvalidCUDAdevice("Error no CUDA devices found!");
}
// Select device
if (config.device_id >= device_count) {
THROW exception::InvalidCUDAdevice("Error setting CUDA device to '%d', only %d available!", config.device_id, device_count);
}
if (deviceInitialised !=- 1 && deviceInitialised != config.device_id) {
THROW exception::InvalidCUDAdevice("Unable to set CUDA device to '%d' after the CUDASimulation has already initialised on device '%d'.", config.device_id, deviceInitialised);
}
// Check the compute capability of the device, throw an exception if not valid for the executable.
if (!util::detail::compute_capability::checkComputeCapability(static_cast<int>(config.device_id))) {
int min_cc = util::detail::compute_capability::minimumCompiledComputeCapability();
int cc = util::detail::compute_capability::getComputeCapability(static_cast<int>(config.device_id));
THROW exception::InvalidCUDAComputeCapability("Error application compiled for CUDA Compute Capability %d and above. Device %u is compute capability %d. Rebuild for SM_%d.", min_cc, config.device_id, cc, cc);
}
cudaStatus = cudaSetDevice(static_cast<int>(config.device_id));
if (cudaStatus != cudaSuccess) {
THROW exception::InvalidCUDAdevice("Unknown error setting CUDA device to '%d'. (%d available)", config.device_id, device_count);
}
// Call cudaFree to initialise the context early
gpuErrchk(cudaFree(nullptr));

@ptheywood ptheywood changed the title Improve CUDAEnsemble Device selection errors Improve CUDAEnsemble device selection errors May 19, 2022
@ptheywood ptheywood changed the title Improve CUDAEnsemble device selection errors Improve CUDAEnsemble device selection error messages May 19, 2022
Robadob added a commit that referenced this issue May 27, 2022
This aligns closer to CUDASimulation's behaviour.

Closes #858
Robadob added a commit that referenced this issue May 27, 2022
This aligns closer to CUDASimulation's behaviour.

Closes #858
Robadob added a commit that referenced this issue May 27, 2022
This aligns closer to CUDASimulation's behaviour.

Closes #858
mondus pushed a commit that referenced this issue Jun 1, 2022
This aligns closer to CUDASimulation's behaviour.

Closes #858
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant