Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLIS should allow simultaneously exporting both 32- and 64-bit variants of BLAS/CBLAS #43

Closed
njsmith opened this issue Mar 1, 2016 · 19 comments

Comments

@njsmith
Copy link

njsmith commented Mar 1, 2016

The de facto standard is that the standard BLAS/CBLAS functions take 32-bit integers in their API. Julia experimented with changing this so that they could use 64-bit integers in their main BLAS wrappers, and this worked great for a little while until they discovered that when people started trying to link in other existing BLAS-using code, this code was assuming that BLAS uses 32-bit integers and was causing segfaults. Their solution was to continue to use a 64-bit integer version of BLAS, but with symbols renamed to avoid collisions (so e.g. dgemm_ uses 32-bit integers, and dgemm_64_ uses 64-bit integers... [edited to get the 64-bit symbol names correct])

As mentioned in #37 (comment) , it would be great if a single BLIS library could export both 32- and 64-bit versions of these symbols simultaneously. It doesn't look like this would be too hard, since both the BLAS2BLIS interface is already generated using C preprocessor magic, and the CBLAS wrapper is already getting programmatically patched...

@tkelman
Copy link
Contributor

tkelman commented Mar 1, 2016

Has also been an issue in Matlab where they use the ILP64 MKL, if you try to link a mex file against system LP64 BLAS, for many years before Julia ever existed. ILP64 BLAS only ever worked in Julia if you didn't try to load any other binary applications compiled against system LP64 BLAS. It was fragile until we implemented the renaming. Lots of context in JuliaLang/julia#4923 OpenMathLib/OpenBLAS#646 OpenMathLib/OpenBLAS#459 JuliaLang/julia#8734.

We just append 64_ to the C-style symbol names, so it's dgemm_64_, or if you try to call it from Fortran (edit: assuming gfortran mangling) it would be dgemm_64. SunPerf actually had separate API names as long as 15 years ago, ref http://www.netlib.org/atlas/atlas-comm/msg00233.html - so we went with the suffix that would let us directly use some existing code in Tim Davis' SuiteSparse that gets activated by -DSUN64.

@jeffhammond
Copy link
Member

Is it possible to support dynamic dispatch, wherein both dgemm32 and dgemm64 are in the library, but at runtime dgemm is mapped to dgemm32, unless the user does e.g. export BLIS_64B_MAGIC? That would be cool.

@jeffhammond
Copy link
Member

@tkelman Why doesn't Julia just call CBLAS? I thought support for that was pretty broad, at least in contexts where Julia is used.

@tkelman
Copy link
Contributor

tkelman commented Mar 1, 2016

We do in a handful of places where the Fortran calling convention differs between gfortran and MKL/Accelerate. I don't think cblas typically supports 64 bit indices - I could be wrong though.

@njsmith
Copy link
Author

njsmith commented Mar 1, 2016

@jeffhammond: That would indeed be cool, but unfortunately I don't think there's any way to make that happen given the vagaries of different platforms' linking models...

@jeffhammond
Copy link
Member

@tkelman Does CBLAS need to support 64b integers to meet Julia's needs? For BLAS1 operations, it is trivial to chunk if there are more than 2B elements in a vector. Given the memory constrains, if one has a matrix dimension larger than 2B, the BLAS2/3 operations are going to perform like BLAS1 operations and thus chunking shouldn't be a big issue.

@njsmith How many different conventions need to be supported for the feature to have high value to the user community?

@tkelman
Copy link
Contributor

tkelman commented Mar 1, 2016

It was easier for us to rename the symbols than add chunking around every single blas and lapack call. When openblas is built with ILP64 indices, its cblas also gets built with 64 bit integers afaict, but that's a non default setting and will cause segfaults if you're not careful about symbol names and loading other blas libraries in the same process.

@jeffhammond
Copy link
Member

@tkelman Eww, that's terrible of OpenBLAS. They should always use C int. Someone should file that bug.

@tkelman
Copy link
Contributor

tkelman commented Mar 1, 2016

They should also always use 32 bit ints for the fortran interface by that reasoning. The problems in practice are identical.

@njsmith
Copy link
Author

njsmith commented Mar 1, 2016

How many different conventions need to be supported for the feature to have high value to the user community?

I think that the use cases that matter in practice are:

  • For systems with 32-bit address spaces: BLIS/BLAS/CBLAS with 32-bit integers.
  • For systems with 64-bit address spaces: BLIS internally using 64-bit integers, with one of the following options for the BLAS/CBLAS layer:
    • dgemm_ and cblas_dgemm exist and use 32-bit integers (what Debian and others want)
    • dgemm_64_ and cblas_dgemm_64 exist and use 64-bit integers (what Julia wants, also useful for Jeff's modern Fortran codes)
    • dgemm_ exists and uses 64-bit integers (for bespoke setups that want to work with legacy fortran code that calls plain dgemm and is compiled with 64-bit INTEGER).

For the last case I'm not sure if anyone ever needs cblas_dgemm to be 64-bit, but I suppose it might be useful sometimes, and MKL seems to provide this as an interface ("All Intel MKL function domains support ILP64 programming but FFTW interfaces", so I guess that includes CBLAS?), so it could be useful for people porting code from MKL, and it's not hard to support.

If we combine the two compatible 64-bit options, this gives us a total of 3 configurations that are important to test/support:

  • 32-bit BLIS
  • 64-bit BLIS with 32-bit dgemm_/cblas_dgemm and 64-bit dgemm_64_/cblas_dgemm_64
  • 64-bit BLIS with 64-bit dgemm_/cblas_dgemm

In principle BLIS certainly could retain the flexibility to support other configurations -- like 32-bit integers internally on 64-bit systems, or 64-bit BLIS + 32-bit dgemm_ and no 64-bit dgemm_, etc. etc., for all the combinatoric possibilities -- but trying to test and support all these seems like a waste of effort to me, since almost all of them are irrelevant in practice.

@njsmith
Copy link
Author

njsmith commented Mar 1, 2016

@jeffhammond: The above list does assume that for your "modern fortran codes" use case, you don't actually care about dgemm_32 and really just want dgemm_64. Is that correct?

@tkelman
Copy link
Contributor

tkelman commented Mar 1, 2016

I completely agree with @njsmith's summary there.

There's the question of what to do on CBLAS as well. In the OpenBLAS scenario as used by Julia, we have a handful of CBLAS symbols that we use (just cblas_[cz]dot[uc]_sub), but we use them with 64 bit integers and renamed with a 64_ suffix accordingly. cblas_cdotu_sub64_ looks a little funny but the suffix is handled by a macro anyway. I believe people are successfully building Julia against ILP64 MKL and also using those same cblas symbols but without a suffix, with a BlasInt type of 64 bits and it's been working fine. I'd have to check whether that's just luck, or if MKL also changes CBLAS integer sizes when you link (or set the environment variable for the dynamic runtime) in ILP64 mode.

@njsmith
Copy link
Author

njsmith commented Mar 1, 2016

Edited my comment to speak more explicitly about CBLAS -- specifically, I think cblas_dgemm_64 and friends should obviously be supported for all the same reasons we want dgemm_64_, and a look at the MKL docs suggests that their IPL64 builds do provide 64-bit cblas_dgemm and friends.

@njsmith
Copy link
Author

njsmith commented Mar 1, 2016

Oh, ugh, except I missed that Julia's 64-bit cblas functions are named like cblas_dgemm64_ instead of cblas_dgemm_64, which is obviously wrong. @tkelman, does fixing this seem at all likely? :-)

@tkelman
Copy link
Contributor

tkelman commented Mar 1, 2016

It's not "wrong" per se, just a consequence of how we implemented it. We apply a suffix uniformly to all symbols in the library, after the gfortran mangling rather than before. Looks funny, but just easier to deal with.

@tkelman
Copy link
Contributor

tkelman commented Mar 1, 2016

Jeez, sorry about the spam, my phone is stupid

@njsmith
Copy link
Author

njsmith commented Mar 2, 2016

@tkelman: I guess by "wrong" i mean "if we were writing a standard from scratch this is obviously not what we would do". Since now we are talking about making a standard and BLIS implementing it, there's a question about whether we should follow the dorky existing thing, or fix the existing thing and then implement that :-). I guess you're voting for following the existing thing?

@tkelman
Copy link
Contributor

tkelman commented Mar 2, 2016

"add a suffix, but before the trailing underscore for symbols that are pretending to be from fortran" (or "add 64_ for f-blas symbols and _64 for c-blas symbols") is a more fiddly rule to implement than "add the same suffix to all symbols." ILP64 is generally a situation where you should know what you are doing, or leave the linking and renaming to someone who does, so I don't care too much what the result looks like. It's almost always going to be handled by preprocessor macros anyway.

@devinamatthews
Copy link
Member

I'm not seeing much consensus here. If someone wants to "create a standard" then we can try to implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants