-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaking compatibility with fortran-only BLAS interfaces #137
Comments
Are there any cblas symbols in that package? I thought most distribution blas packages were actually cblas these days. The difficulty is the calling convention return type from Fortran Can we solve JuliaLang/julia#5283 in a portable way with only Fortran interfaces? Should we add |
There are no |
I don't think this is something we need to consider a problem because reference BLAS is so slow. How did you encounter the problem @staticfloat?. (I also think we see many issues with CentOS.) |
Looks like this actually works okay on my RHEL5 system:
So it might just be an issue with Accelerate using the f2c calling convention? Care to try the above on Mac? |
It crashes, but it also used to be a problem with MKL when we built with gfortran. However, if everything is built with Intel compilers, it shouldn't be a problem to use the Fortran version. |
Isn't this what |
I'm not sure if it covers completely. For a long time, complex I just remembered this comment JuliaLang/julia#5283 (comment) saying that we still cannot rely on complex return on 32 bit systems. If you look here, you can see that the 32 bit is left out. |
If there is anything useful to use within vecLibFort, I would say that it would be preferable to duplicate it using Julia's native C interface capability rather than linking to my code. And I would be happy to help with that. |
The core issues are this: 1) Accelerate returns double-precision results from single-precision functions; and 2) Accelerate uses the f2c calling convention for complex values (the first argument is a pointer got the result). |
The 32-bit issue should be fixed fairly soon, when JuliaLang/julia#7906 merges |
Thank you for the clarifications. Then it appears that we can avoid cblas almost for free. The only casualty is gfortran+MKL which, I think, we don't recommend anyway. |
I tried gfortran+MKL somewhat recently when Viral was cleaning things up for icc/ifort and it's pretty broken right now anyway. Probably because we're setting things up to use |
Maybe we should revisit whether it's worth calling blas for dot? |
Pretty sure we can avoid calling blas for dot. |
On my machine, a Julia julia> x, y = complex(randn(n), randn(n)), complex(randn(n), randn(n));
julia> @benchmark BLAS.dotc($x, $y)
BenchmarkTools.Trial:
samples: 7014
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 96.00 bytes
allocs estimate: 1
minimum time: 685.01 μs (0.00% GC)
median time: 698.54 μs (0.00% GC)
mean time: 711.15 μs (0.00% GC)
maximum time: 2.35 ms (0.00% GC)
julia> @benchmark mydot($x, $y)
BenchmarkTools.Trial:
samples: 5424
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 867.69 μs (0.00% GC)
median time: 884.09 μs (0.00% GC)
mean time: 919.84 μs (0.00% GC)
maximum time: 2.27 ms (0.00% GC) It's not much and hopefully we can improve the Julia version a bit so think it's fine to switch. |
Is the native Julia dot implementation multithreaded? |
No, but neither is OpenBlas's IIRC |
Ah, fair enough. You might want to ask the Numba developers though---I seem to remember they found circumstances where BLAS dot had some advantages, at least for certain vector sizes. I can't recall why. |
I don't think the complex Julia version vectorizes as well as it should. |
I looked for a manually vectorized version in OpenBLAS, but couldn't find any. I assume that it just uses a plain Fortran loop, and that that Fortran code is then well vectorized. Is that a difference between GCC and LLVM? Or is there something in our complex multiplication or complex conjugation definitions that prevents vectorization? Does it vectorize if we use |
Unlikely to receive attention prior to 0.6. Best! |
Referring to the original post, I think we are going to keep what we have and it hasn't been an issue for a while. Closing and suggest reopening if necessary. |
We seem to have broken compatibility with fortran-interface-only BLAS libraries. Specifically, we use
cblas_zdotc_sub
andcblas_cdotc_sub
, which don't exist in some BLAS implementations such as CentOS 7's defaultblas
package. Is this something we want to work around, revert, or just live with?The text was updated successfully, but these errors were encountered: