-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[src] Add functions to set Cudevice properties #4521
Conversation
* Add two static member functions to CuDevice which may be called before Initialize() to set boolean options that currently settable only from config file: - EnableTensorCores() - EnableTf32Compute() * Remove unneeded inline keywords and mark const functions const. * Rearrange includes, fix spacing, wrapping and commentary to the coding standard, and for Doxygen. * Remove extra vertical and all EOL whitespace.
* Add two static member functions to CuDevice which may be called before Initialize() to set boolean options that currently settable only from config file: - EnableTensorCores() - EnableTf32Compute() * Remove unneeded inline keywords and mark const functions const. * Rearrange includes, fix spacing, wrapping and commentary to the coding standard, and for Doxygen. * Remove extra vertical annd all EOL whitespace.
cublasHandle_t GetCublasHandle() const { return cublas_handle_; } | ||
cusparseHandle_t GetCusparseHandle() const { return cusparse_handle_; } | ||
curandGenerator_t GetCurandHandle() const { return curand_handle_; } | ||
cusolverDnHandle_t GetCusolverDnHandle() const { | ||
#if CUDA_VERSION < 9010 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not really make much sense to me... if the condition is true, then we compile with cuda libs lesser than 9010 (and the error will be printed no matter what the runtime env is) -- in that case it would be perhaps more correct to replace it with static_assert
If the condition is false, then the check will not be done in runtime no matter what the runtime env is...
So either we have to test dynamically by querying properties of the device or the runtime env (but I think the libs are not guaranteed to be compatible in that way, so that's a moot point) or just replace it with static_assert
@hugovbraun @galv ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is orthogonal. CUDA has a set of higher-libraries which can run on different hardware and with a range of driver versions. It is possible to add an implementation of a new algo in a new version (Dn
is for "dense", this one is the dense subset of LAPACK). In a newer version of CUDA, there may be an implementation for older hardware, although the computing architectures are also being retired, so newer CUDA versions do not support outdated architectures (who needs compute_20 anymore?). This is just like Intel IPP or MKL: if anything is added (to IPP often, to MKL rarely), there are implementations for all CPUs, with the fallback to the most generic one still supported. So it's not about the runtime driver or hardware at all.
The driver comes with libcuda.so as its userland interface between CUDA libraries and the driver. It is part of the driver package, and is 1:1 with the kernel module (won't work with a different driver version). ldd
always shows the linked SO as libcuda.so.1
, but really it is a symlink to the library perfectly matching the kernel module version:
$ ls -l /lib/x86_64-linux-gnu/libcuda.so.1
lrwxrwxrwx 1 root root 20 2021-04-28 17:01:56 /lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.465.19.01
$ modinfo -Fversion nvidia
465.19.01
The static_assert will not work outside of templates. If this were a template function, the assert would trigger at the instantiation locus. But in a non-template function, any file including this header would fail on the static_assert had the version check not pass: the condition would always resolve at the function definition point.
It's easy to do with templates, though, which always remain non-instantiated unless called, like this:
template<int V = CUDA_VERSION>
cusolverDnHandle_t GetCusolverDnHandle() const {
static_assert(V >= 9010);
. . .
}
If you try to call the function, simply as GetCusolverDnHandle()
(without an explicit template argument--that's he whole purpose), the template will instantiate with the default parameter, and the client program will fail to compile if the assertion is not satisfied. Note that cusolverDnHandle_t
is typedef's above in the current code, in a preprocessor #if
condition to some BS type (void*
, IIRC), since it's not usable anyway (there is no type cusolverDnHandle_t
in older CUDA).
I would certainly prefer a compile time failure to a runtime failure. Here's a POC: https://godbolt.org/z/cdds9M6Gj (sorry, the screen is a bit dense; there are 4 identical compilers on the right, with different -D
switches; look at green OK or red Error icons; the two bottom-right panes show error output. They are numbered like "editor #1 compiler #N", N = 1...4. The only editor with code is on the left half of screen. Numbers in comments refer to these compilers with different combinations of -Defines.
@danpovey, would you be ok with that?
LGTM |
while the static_assert can be implemented using templates, the version
that is part of the standard definitely does not expect it will be used
only from templates:
https://godbolt.org/z/8Ef3jTsrh
…On Sat, May 8, 2021 at 3:44 PM kkm000 ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/cudamatrix/cu-device.h
<#4521 (comment)>:
> CuDevice &ans = this_thread_device_;
if (!ans.initialized_)
ans.Initialize();
return ans;
}
- inline cublasHandle_t GetCublasHandle() { return cublas_handle_; }
- inline cusparseHandle_t GetCusparseHandle() { return cusparse_handle_; }
- inline curandGenerator_t GetCurandHandle() { return curand_handle_; }
- inline cusolverDnHandle_t GetCusolverDnHandle() {
+ cublasHandle_t GetCublasHandle() const { return cublas_handle_; }
+ cusparseHandle_t GetCusparseHandle() const { return cusparse_handle_; }
+ curandGenerator_t GetCurandHandle() const { return curand_handle_; }
+ cusolverDnHandle_t GetCusolverDnHandle() const {
#if CUDA_VERSION < 9010
This is orthogonal. CUDA has a set of higher-libraries which can run on
different hardware and with a range of driver versions. It is possible to
add an implementation of a new algo in a new version (Dn is for "dense",
this one is the dense subset of LAPACK
<https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-intro>). In
a newer version of CUDA, there may be an implementation for older hardware,
although the computing architectures are also being retired, so newer CUDA
versions do not support outdated architectures (who needs compute_20
anymore?). This is just like Intel IPP or MKL: if anything is added (to IPP
often, to MKL rarely), there are implementations for all CPUs, with the
fallback to the most generic one still supported. So it's not about the
runtime driver or hardware at all.
The driver comes with libcuda.so as its userland interface between CUDA
libraries and the driver. It is part of the driver package, and is 1:1 with
the kernel module (won't work with a different driver version). ldd
always shows the linked SO as libcuda.so.1, but really it is a symlink to
the library perfectly matching the kernel module version:
$ ls -l /lib/x86_64-linux-gnu/libcuda.so.1
lrwxrwxrwx 1 root root 20 2021-04-28 17:01:56 /lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.465.19.01
$ modinfo -Fversion nvidia
465.19.01
------------------------------
The static_assert will not work outside of templates. If this were a
template function, the assert would trigger at the instantiation locus. But
in a non-template function, any file including this header would fail on
the static_assert had the version check not pass: the condition would
always resolve at the function definition point.
It's easy to do with templates, though, which always remain
non-instantiated unless called, like this:
template<int V = CUDA_VERSION>
cusolverDnHandle_t GetCusolverDnHandle() const {
static_assert(V >= 9010);
. . .
}
If you try to call the function, simply as GetCusolverDnHandle() (without
an explicit template argument--that's he whole purpose), the template will
instantiate with the default parameter, and the client program will fail to
compile. Note that cusolverDnHandle_t is typedef's above in the current
code, in a preprocessor #if condition to some BS type (void*, IIRC),
since it's not usable anyway (there is no type cusolverDnHandle_t in
older CUDA).
I would certainly prefer a compile time failure to a runtime failure.
Here's a POC: https://godbolt.org/z/cdds9M6Gj (sorry, the screen is a bit
dense; there are 4 identical compilers on the right, with different -D
switches; look at green OK or red Error icons; the two bottom-right panes
show error output. They are numbered like "editor #1
<#1> compiler #N", N = 1...4. The
only editor with code is on the left half of screen.
@danpovey <https://github.com/danpovey>, would you be ok with that?
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#4521 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUKYX2LXWPPITOVRKRCYFLTMWIBTANCNFSM44MRXTBA>
.
|
@jtrmal, sure it can, and I do that e.g. to assert that types are same (works as both sanity check and a documentation for reading code). But we want to fail compilation only if the user in fact calls this function, and CUDA version is too low. A static_assert in an instantiated function is always tested, even if the function is never called, which is clear to the compiler from looking at a single compilation unit, and would be obviously optimized out, like in this example: https://godbolt.org/z/1cWTK89n5 (EDIT: Extended example, that also incidenally shows that |
LGTM! |
Add two static member functions to
CuDevice
which may be called beforeInitialize()
to set boolean options that currently only settable from config file:EnableTensorCores()
andEnableTf32Compute()
Chores (separate commit for easier review; commit messages are both identical and complete):
inline
keywords (class member function defined inline are implicitlyinline
), and mark const functionsconst
.