-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault of matrix dot product when use openblas on windows 10 #1110
Comments
Not sure. A backtrace maybe could help. Does it work with ndarray 0.14.x? We have a few significant changes. (See readme for versions). |
Hi, sorry for the delay. I used the debug tool to check the detail of the error, and I found this issue happens at function exec_blas in blas_server_win32.c, line 475, as shown below: #if defined(SMP_SERVER) && defined(OS_CYGWIN_NT) #ifndef ALL_THREADED if ((num <= 0) || (queue == NULL)) return 0; if ((num > 1) && queue -> next) exec_blas_async(1, queue -> next); routine = queue -> routine; if (queue -> mode & BLAS_LEGACY) { if ((num > 1) && queue -> next) exec_blas_async_wait(num - 1, queue -> next); return 0; The error message is: Because this error happened in function exec_blas_async, so I tried to make the calculation with only single thread by set env variable OPENBLAS_NUM_THREADS=1, but this error still exists. I am wondering whether this is the error from openblas... So I am quite grateful for any help. |
Hi, the error is certainly being raised inside openblas, but it could be that we for some reason pass it wrong information or other configuration is wrong. Does it work with other blas backends? What about ndarray 0.14? |
Hi, I did the test on another win10 computer with ndarray=0.14.0, but this error still occurred. |
Thanks for testing. Ndarray 0.14 to ndarray 0.15 changed how we link to blas-src, so then we maybe can rule out that change. Is this issue fixed? blas-lapack-rs/openblas-src#80 I would look into two ideas, but don't really know:
|
Okay so I was able to reproduce this. I'm fairly certain this is not an ndarray issue, I made the following minimal repro which also crashes, which (AFAIK correctly) calls directly into cblas_dgemm: extern crate blas_src;
const SIZE: usize = 65;
use cblas_sys as blas_sys;
use cblas_sys::{CblasNoTrans, CBLAS_LAYOUT};
fn main() {
let aa: Vec<f64> = vec![0.; SIZE * SIZE];
let ab: Vec<f64> = vec![0.; SIZE * SIZE];
let mut res = vec![0.; SIZE * SIZE];
let m = SIZE;
let n = SIZE;
let k = SIZE;
unsafe {
blas_sys::cblas_dgemm(
CBLAS_LAYOUT::CblasRowMajor,
CblasNoTrans,
CblasNoTrans,
m as i32,
n as i32,
k as i32,
1.0,
aa.as_ptr(),
SIZE as i32,
ab.as_ptr(),
SIZE as i32,
0.0,
res.as_mut_ptr(),
SIZE as i32,
)
};
println!("{}", res[2 * SIZE + 3]);
} Setting EDIT: Further progress, if I link to the official binaries https://github.com/xianyi/OpenBLAS/releases it works fine, so it is almost certainly a bug in vcpkg or openblas-src/cargo-vcpkg. Next up I'll be building a C project against both vcpkg and the official binaries to check where the actual issue is, but I am going to sleep for now :) |
Great. So the question is what can we do, where to send this :) i guess workarounds can be recommended, avoid this configuration? Build some blas src? |
It turns out that openblas compiled with vcpkg is built against ilp64, so since cblas-sys uses lp64, there are issues. I think the Edit: @zhongyi51 as a temporary workaround, you can add the following hack to your [target.'cfg(target_os = "windows")'.patch.crates-io]
cblas-sys = { git = "https://github.com/steabert/cblas-sys.git", features = ["ilp64"] } of course, I make no promises about this as a solution :) |
Nice find! I think we should close this - this is one of the challenges when setting up blas. Compiling blas from blas-src will give a more predictable result, using system packages as a shortcut is useful, but has challenges. Good luck. As noted, cblas-sys absolutely does only support lp64, so that's what we support. |
Unfortunately this is incredibly difficult due to a lack of production Fortran compilers on Windows that are not ifort :/ I had I do agree this is not an issue for ndarray, and I will follow up on openblas-src. |
Hello... I am trying to statically link openblas to my rust project with ndarray. However, I found when the matrix size is higher than 64, a segfault error will be thrown.
The source code is:
When I change the matrix size below 64, anything will be fine; however, this error will be thrown when matrix size is higher than 64:
The error massage is:
error: process didn't exit successfully: target\release\classgameground.exe (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)
I am so grateful for any help!
=============================================================
os: windows 10 professional 19042.1288
vcpkg version: 2021-11-02-af04ebf6274fd6f7a941bff4662b3955c64f6f42 (newest from github)
openblas-src version: 0.10 (vcpkg: openblas_x64-windows-static-md)
dependencies of project:
[dependencies] rand="0.8.4" ndarray = { version = "0.15.3", features = ["blas"] } blas-src = { version = "0.8", features = ["openblas"] } openblas-src = { version = "0.10", features = ["cblas","lapacke","system","static"] }
The text was updated successfully, but these errors were encountered: