Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[src] Add functions to set Cudevice properties #4521

Merged
merged 2 commits into from
May 8, 2021

Conversation

kkm000
Copy link
Contributor

@kkm000 kkm000 commented May 8, 2021

Add two static member functions to CuDevice which may be called before Initialize() to set boolean options that currently only settable from config file: EnableTensorCores() and EnableTf32Compute()

Chores (separate commit for easier review; commit messages are both identical and complete):

  • Remove unneeded inline keywords (class member function defined inline are implicitly inline), and mark const functions const.
  • Rearrange includes, fix spacing, wrapping and commentary to the coding standard, and for Doxygen.
  • Remove extra vertical and all EOL whitespace.

kkm000 added 2 commits May 8, 2021 02:32
* Add two static member functions to CuDevice which may be
  called before Initialize() to set boolean options that
  currently settable only from config file:
  - EnableTensorCores()
  - EnableTf32Compute()
* Remove unneeded inline keywords and mark const functions const.
* Rearrange includes, fix spacing, wrapping and commentary
  to the coding standard, and for Doxygen.
* Remove extra vertical and all EOL whitespace.
* Add two static member functions to CuDevice which may be
  called before Initialize() to set boolean options that
  currently settable only from config file:
  - EnableTensorCores()
  - EnableTf32Compute()
* Remove unneeded inline keywords and mark const functions const.
* Rearrange includes, fix spacing, wrapping and commentary
  to the coding standard, and for Doxygen.
* Remove extra vertical annd all EOL whitespace.
@kkm000 kkm000 requested review from danpovey and jtrmal May 8, 2021 09:41
cublasHandle_t GetCublasHandle() const { return cublas_handle_; }
cusparseHandle_t GetCusparseHandle() const { return cusparse_handle_; }
curandGenerator_t GetCurandHandle() const { return curand_handle_; }
cusolverDnHandle_t GetCusolverDnHandle() const {
#if CUDA_VERSION < 9010
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not really make much sense to me... if the condition is true, then we compile with cuda libs lesser than 9010 (and the error will be printed no matter what the runtime env is) -- in that case it would be perhaps more correct to replace it with static_assert
If the condition is false, then the check will not be done in runtime no matter what the runtime env is...
So either we have to test dynamically by querying properties of the device or the runtime env (but I think the libs are not guaranteed to be compatible in that way, so that's a moot point) or just replace it with static_assert
@hugovbraun @galv ?

Copy link
Contributor Author

@kkm000 kkm000 May 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is orthogonal. CUDA has a set of higher-libraries which can run on different hardware and with a range of driver versions. It is possible to add an implementation of a new algo in a new version (Dn is for "dense", this one is the dense subset of LAPACK). In a newer version of CUDA, there may be an implementation for older hardware, although the computing architectures are also being retired, so newer CUDA versions do not support outdated architectures (who needs compute_20 anymore?). This is just like Intel IPP or MKL: if anything is added (to IPP often, to MKL rarely), there are implementations for all CPUs, with the fallback to the most generic one still supported. So it's not about the runtime driver or hardware at all.

The driver comes with libcuda.so as its userland interface between CUDA libraries and the driver. It is part of the driver package, and is 1:1 with the kernel module (won't work with a different driver version). ldd always shows the linked SO as libcuda.so.1, but really it is a symlink to the library perfectly matching the kernel module version:

$ ls -l /lib/x86_64-linux-gnu/libcuda.so.1
lrwxrwxrwx 1 root root 20 2021-04-28 17:01:56 /lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.465.19.01
$ modinfo -Fversion nvidia
465.19.01

The static_assert will not work outside of templates. If this were a template function, the assert would trigger at the instantiation locus. But in a non-template function, any file including this header would fail on the static_assert had the version check not pass: the condition would always resolve at the function definition point.

It's easy to do with templates, though, which always remain non-instantiated unless called, like this:

template<int V = CUDA_VERSION>
cusolverDnHandle_t GetCusolverDnHandle() const {
  static_assert(V >= 9010);
  . . .
}

If you try to call the function, simply as GetCusolverDnHandle() (without an explicit template argument--that's he whole purpose), the template will instantiate with the default parameter, and the client program will fail to compile if the assertion is not satisfied. Note that cusolverDnHandle_t is typedef's above in the current code, in a preprocessor #if condition to some BS type (void*, IIRC), since it's not usable anyway (there is no type cusolverDnHandle_t in older CUDA).

I would certainly prefer a compile time failure to a runtime failure. Here's a POC: https://godbolt.org/z/cdds9M6Gj (sorry, the screen is a bit dense; there are 4 identical compilers on the right, with different -D switches; look at green OK or red Error icons; the two bottom-right panes show error output. They are numbered like "editor #1 compiler #N", N = 1...4. The only editor with code is on the left half of screen. Numbers in comments refer to these compilers with different combinations of -Defines.

@danpovey, would you be ok with that?

@jtrmal
Copy link
Contributor

jtrmal commented May 8, 2021

LGTM

@kkm000 kkm000 merged commit 344c691 into kaldi-asr:master May 8, 2021
@kkm000 kkm000 deleted the 2104-cudevice-properties branch May 8, 2021 19:53
@jtrmal
Copy link
Contributor

jtrmal commented May 8, 2021 via email

@kkm000
Copy link
Contributor Author

kkm000 commented May 10, 2021

@jtrmal, sure it can, and I do that e.g. to assert that types are same (works as both sanity check and a documentation for reading code).

But we want to fail compilation only if the user in fact calls this function, and CUDA version is too low. A static_assert in an instantiated function is always tested, even if the function is never called, which is clear to the compiler from looking at a single compilation unit, and would be obviously optimized out, like in this example: https://godbolt.org/z/1cWTK89n5 (EDIT: Extended example, that also incidenally shows that inline and static are ignored within an unnamed namespace). With a template, the static_assert is checked at the template instantiation time. This is why only a template would work.

@danpovey
Copy link
Contributor

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants