[src] Add functions to set Cudevice properties #4521

kkm000 · 2021-05-08T09:41:07Z

Add two static member functions to CuDevice which may be called before Initialize() to set boolean options that currently only settable from config file: EnableTensorCores() and EnableTf32Compute()

Chores (separate commit for easier review; commit messages are both identical and complete):

Remove unneeded inline keywords (class member function defined inline are implicitly inline), and mark const functions const.
Rearrange includes, fix spacing, wrapping and commentary to the coding standard, and for Doxygen.
Remove extra vertical and all EOL whitespace.

* Add two static member functions to CuDevice which may be called before Initialize() to set boolean options that currently settable only from config file: - EnableTensorCores() - EnableTf32Compute() * Remove unneeded inline keywords and mark const functions const. * Rearrange includes, fix spacing, wrapping and commentary to the coding standard, and for Doxygen. * Remove extra vertical and all EOL whitespace.

* Add two static member functions to CuDevice which may be called before Initialize() to set boolean options that currently settable only from config file: - EnableTensorCores() - EnableTf32Compute() * Remove unneeded inline keywords and mark const functions const. * Rearrange includes, fix spacing, wrapping and commentary to the coding standard, and for Doxygen. * Remove extra vertical annd all EOL whitespace.

jtrmal · 2021-05-08T14:56:33Z

src/cudamatrix/cu-device.h

+  cublasHandle_t GetCublasHandle() const { return cublas_handle_; }
+  cusparseHandle_t GetCusparseHandle() const { return cusparse_handle_; }
+  curandGenerator_t GetCurandHandle() const { return curand_handle_; }
+  cusolverDnHandle_t GetCusolverDnHandle() const {
 #if CUDA_VERSION < 9010


This does not really make much sense to me... if the condition is true, then we compile with cuda libs lesser than 9010 (and the error will be printed no matter what the runtime env is) -- in that case it would be perhaps more correct to replace it with static_assert
If the condition is false, then the check will not be done in runtime no matter what the runtime env is...
So either we have to test dynamically by querying properties of the device or the runtime env (but I think the libs are not guaranteed to be compatible in that way, so that's a moot point) or just replace it with static_assert
@hugovbraun @galv ?

This is orthogonal. CUDA has a set of higher-libraries which can run on different hardware and with a range of driver versions. It is possible to add an implementation of a new algo in a new version (Dn is for "dense", this one is the dense subset of LAPACK). In a newer version of CUDA, there may be an implementation for older hardware, although the computing architectures are also being retired, so newer CUDA versions do not support outdated architectures (who needs compute_20 anymore?). This is just like Intel IPP or MKL: if anything is added (to IPP often, to MKL rarely), there are implementations for all CPUs, with the fallback to the most generic one still supported. So it's not about the runtime driver or hardware at all.

The driver comes with libcuda.so as its userland interface between CUDA libraries and the driver. It is part of the driver package, and is 1:1 with the kernel module (won't work with a different driver version). ldd always shows the linked SO as libcuda.so.1, but really it is a symlink to the library perfectly matching the kernel module version:

$ ls -l /lib/x86_64-linux-gnu/libcuda.so.1 lrwxrwxrwx 1 root root 20 2021-04-28 17:01:56 /lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.465.19.01 $ modinfo -Fversion nvidia 465.19.01

The static_assert will not work outside of templates. If this were a template function, the assert would trigger at the instantiation locus. But in a non-template function, any file including this header would fail on the static_assert had the version check not pass: the condition would always resolve at the function definition point.

It's easy to do with templates, though, which always remain non-instantiated unless called, like this:

template<int V = CUDA_VERSION> cusolverDnHandle_t GetCusolverDnHandle() const { static_assert(V >= 9010); . . . }

If you try to call the function, simply as GetCusolverDnHandle() (without an explicit template argument--that's he whole purpose), the template will instantiate with the default parameter, and the client program will fail to compile if the assertion is not satisfied. Note that cusolverDnHandle_t is typedef's above in the current code, in a preprocessor #if condition to some BS type (void*, IIRC), since it's not usable anyway (there is no type cusolverDnHandle_t in older CUDA).

I would certainly prefer a compile time failure to a runtime failure. Here's a POC: https://godbolt.org/z/cdds9M6Gj (sorry, the screen is a bit dense; there are 4 identical compilers on the right, with different -D switches; look at green OK or red Error icons; the two bottom-right panes show error output. They are numbered like "editor #1 compiler #N", N = 1...4. The only editor with code is on the left half of screen. Numbers in comments refer to these compilers with different combinations of -Defines.

@danpovey, would you be ok with that?

jtrmal · 2021-05-08T14:57:21Z

LGTM

jtrmal · 2021-05-08T20:04:52Z

while the static_assert can be implemented using templates, the version that is part of the standard definitely does not expect it will be used only from templates: https://godbolt.org/z/8Ef3jTsrh

…

On Sat, May 8, 2021 at 3:44 PM kkm000 ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/cudamatrix/cu-device.h <#4521 (comment)>: > CuDevice &ans = this_thread_device_; if (!ans.initialized_) ans.Initialize(); return ans; } - inline cublasHandle_t GetCublasHandle() { return cublas_handle_; } - inline cusparseHandle_t GetCusparseHandle() { return cusparse_handle_; } - inline curandGenerator_t GetCurandHandle() { return curand_handle_; } - inline cusolverDnHandle_t GetCusolverDnHandle() { + cublasHandle_t GetCublasHandle() const { return cublas_handle_; } + cusparseHandle_t GetCusparseHandle() const { return cusparse_handle_; } + curandGenerator_t GetCurandHandle() const { return curand_handle_; } + cusolverDnHandle_t GetCusolverDnHandle() const { #if CUDA_VERSION < 9010 This is orthogonal. CUDA has a set of higher-libraries which can run on different hardware and with a range of driver versions. It is possible to add an implementation of a new algo in a new version (Dn is for "dense", this one is the dense subset of LAPACK <https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-intro>). In a newer version of CUDA, there may be an implementation for older hardware, although the computing architectures are also being retired, so newer CUDA versions do not support outdated architectures (who needs compute_20 anymore?). This is just like Intel IPP or MKL: if anything is added (to IPP often, to MKL rarely), there are implementations for all CPUs, with the fallback to the most generic one still supported. So it's not about the runtime driver or hardware at all. The driver comes with libcuda.so as its userland interface between CUDA libraries and the driver. It is part of the driver package, and is 1:1 with the kernel module (won't work with a different driver version). ldd always shows the linked SO as libcuda.so.1, but really it is a symlink to the library perfectly matching the kernel module version: $ ls -l /lib/x86_64-linux-gnu/libcuda.so.1 lrwxrwxrwx 1 root root 20 2021-04-28 17:01:56 /lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.465.19.01 $ modinfo -Fversion nvidia 465.19.01 ------------------------------ The static_assert will not work outside of templates. If this were a template function, the assert would trigger at the instantiation locus. But in a non-template function, any file including this header would fail on the static_assert had the version check not pass: the condition would always resolve at the function definition point. It's easy to do with templates, though, which always remain non-instantiated unless called, like this: template<int V = CUDA_VERSION> cusolverDnHandle_t GetCusolverDnHandle() const { static_assert(V >= 9010); . . . } If you try to call the function, simply as GetCusolverDnHandle() (without an explicit template argument--that's he whole purpose), the template will instantiate with the default parameter, and the client program will fail to compile. Note that cusolverDnHandle_t is typedef's above in the current code, in a preprocessor #if condition to some BS type (void*, IIRC), since it's not usable anyway (there is no type cusolverDnHandle_t in older CUDA). I would certainly prefer a compile time failure to a runtime failure. Here's a POC: https://godbolt.org/z/cdds9M6Gj (sorry, the screen is a bit dense; there are 4 identical compilers on the right, with different -D switches; look at green OK or red Error icons; the two bottom-right panes show error output. They are numbered like "editor #1 <#1> compiler #N", N = 1...4. The only editor with code is on the left half of screen. @danpovey <https://github.com/danpovey>, would you be ok with that? — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#4521 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACUKYX2LXWPPITOVRKRCYFLTMWIBTANCNFSM44MRXTBA> .

kkm000 · 2021-05-10T07:46:24Z

@jtrmal, sure it can, and I do that e.g. to assert that types are same (works as both sanity check and a documentation for reading code).

But we want to fail compilation only if the user in fact calls this function, and CUDA version is too low. A static_assert in an instantiated function is always tested, even if the function is never called, which is clear to the compiler from looking at a single compilation unit, and would be obviously optimized out, like in this example: https://godbolt.org/z/1cWTK89n5 (EDIT: Extended example, that also incidenally shows that inline and static are ignored within an unnamed namespace). With a template, the static_assert is checked at the template instantiation time. This is why only a template would work.

danpovey · 2021-05-10T08:05:12Z

LGTM!

kkm000 added 2 commits May 8, 2021 02:32

kkm000 requested review from danpovey and jtrmal May 8, 2021 09:41

jtrmal reviewed May 8, 2021

View reviewed changes

kkm000 merged commit 344c691 into kaldi-asr:master May 8, 2021

kkm000 deleted the 2104-cudevice-properties branch May 8, 2021 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[src] Add functions to set Cudevice properties #4521

[src] Add functions to set Cudevice properties #4521

kkm000 commented May 8, 2021

jtrmal May 8, 2021

kkm000 May 8, 2021 •

edited

Loading

jtrmal commented May 8, 2021

jtrmal commented May 8, 2021 via email

kkm000 commented May 10, 2021 •

edited

Loading

danpovey commented May 10, 2021

[src] Add functions to set Cudevice properties #4521

[src] Add functions to set Cudevice properties #4521

Conversation

kkm000 commented May 8, 2021

jtrmal May 8, 2021

Choose a reason for hiding this comment

kkm000 May 8, 2021 • edited Loading

Choose a reason for hiding this comment

jtrmal commented May 8, 2021

jtrmal commented May 8, 2021 via email

kkm000 commented May 10, 2021 • edited Loading

danpovey commented May 10, 2021

kkm000 May 8, 2021 •

edited

Loading

kkm000 commented May 10, 2021 •

edited

Loading