Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Always use tinyBLAS with AMD GPUs on Windows
When llamafile uses hipBLAS with ROCm SDK 5.7.1 on Windows10 the process crashes shortly after tokens start getting printed. This is possibly the worst heisenbug I've ever seen in my career. It seems to to crash in AMD code, in a separate thread, inside hipGraphicsUnregisterResource, when a vqmovdqu instruction is being executed. While this happens, cosmo's main thread is usually doing something like std::string and std::locale stuff which appears unrelated. Could possibly be related to C++ exceptions and thread-local storage. Using --tinyblas appears to make it go away, but I can't say for certain it has anything to do with hipBLAS, since it might simply not manifest itself, because the binary footprint, stack, or heap memory layout changed. Let's keep our fingers crossed that tinyBLAS will save us from this issue. Note also that no one else has reported the bug even though it's been impacting me for months.
- Loading branch information