-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: add Windows CLBlast and OpenBLAS builds #1277
Conversation
Similar PR #1271 for cuBLAS. |
.github/workflows/build.yml
Outdated
@@ -187,7 +231,7 @@ jobs: | |||
|
|||
- name: Test | |||
id: cmake_test | |||
if: ${{ matrix.build != 'avx512' || env.HAS_AVX512F == '1' }} # Test AVX-512 only when possible | |||
if: ${{ ( matrix.build != 'avx512' || env.HAS_AVX512F == '1' ) && matrix.build != 'opencl' }} # Test AVX-512 only when possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are making a longer and longer conditional here, should we instead set a test variable in the matrix instead?
eg matrix.test == 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it. But then got to thinking that the library (OpenCL, OpenBLAS, CUDA) is separate from the CPU stuff (AVX...) but that will give us so many different builds that is just stupid. But maybe not all of them need to be released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i split the cublas build out in my pr, it was just too much hassle adding if build==cublas everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just revert the testing flag then, it would be simpler.
Yes 😅, but I was working on it already yesterday and saw your PR today, I took some tips from it as well.
The .cmake files of CLBlast. Its done in the Powershell part of the "Download CLBlast" step. It wouldn't be needed if we compiled CLBlast ourselves, then it would have correct paths. My first version was like that because I wanted to get a static lib which their release .zip doesn't have but then I had trouble making MSVC to link it to llama.cpp, so I gave up. |
did you try vcpkg? - if clblast is not too large, that might work too. |
similar to what i tested yesterday 😆 |
I think |
I can think of one reason is to not create a lot of releases when you're just trying to fix some small CI thing. It can also be run manually, if needed. |
But a pending PR would not create a release anyway? A commit to master does, but that's another story: #1055 |
@sw, you're right. It would be good to see the CI results here too, FWIW, here is the recent run on my branch: https://github.com/SlyEcho/llama.cpp/actions/runs/4862066816/jobs/8667931312 |
Seems to be broken also, looking for a file For OpenBLAS, it seems to build a very small library for the native CPU and the Maybe I don't understand how vcpkg works well enough. |
@Green-Sky, I tried to make a release: https://github.com/SlyEcho/llama.cpp/actions/runs/4883230717 But I think I don't have permissions to add releases to this repo? How would it work? And where did the artifacts go? |
@SlyEcho it needs to be in this repo, not your fork. so as a workaround i pushed to a branch |
Can I create branches here? I think I found out the problem, the the Actions settings had a read-only mode enabled. It now created a release (on my fork): https://github.com/SlyEcho/llama.cpp/releases/tag/cistuff-42b1757 |
Oh, i mistook your "Contributor" badge for a "Collaborator" badge.
nice |
Works On My Machine™ I don't know if it is significant that these new builds are also AVX2, maybe they shouldn't be for maximum compatibility? llama-cistuff-42b1757-bin-win-opencl-x64\main.exe
llama-cistuff-42b1757-bin-win-openblas-x64\main.exe
AMD integrated graphics on Ryzen 7 PRO 5850U are still so slow, OpenBLAS beats it by quite significant margin. (It is way slower in Linux, actually). The Steam Deck is over 2 times faster. |
AVX2 has been available for 10 years now, any CPU fast enough to use llama.cpp most likely supports AVX2. Removing AVX2 from these builds will force most users to choose between faster prompt speed or faster generation speed. |
Need to add license files as well, OpenBLAS demands it as well as CLBlast. |
I also tried without AVX2 and it was embarassingly slow. I think running either OpenBLAS or CLBlast on computers as old as that will unlikely give any meaningful performance gain also. AVX512 on the other hand, for the people with more powerful systems? |
you mean, very recent hardware? 😄 |
FWIW: this might change in the future, but currently there are no meaningful AVX-512-accelerated parts in |
It is currently working, however I feel we should build OpenBLAS and CLBlast ourselves, for OpenBLAS we could trim down the library quite a bit because it currently builds in support for older CPUs that llama.cpp doesn't support (with AVX2). And also LAPACK, complex numbers, etc. |
Is there any reason to distribute windows binaries without at least OpenBLAS? We could just modify the avx/avx2/avx512 builds to include OpenBLAS. |
I had the same idea, the only reason is the size of the library (over 50 megs). But a custom static lib would be better for users as well. |
The zips are only 12 MB though, I don't think it's much of an issue. |
I found out that OpenBLAS supports only a generic C build onder MSVC, since the inline assembly is not compatible (this explains the vcpkg error). Now trying a completely Linux MingW cross compile of both OpenBLAS and llama.cpp. It takes about 5 minutes to build the library when it's stripped down and we can optimize for the same CPU type (roughly) as llama.cpp. I wanna figure out how to use GitHub caches to speed it up, because the libraries would just have the same fixed version most of the time. But it doesn't have to be in this PR. We'll see. |
OK, some MingW builds are up: https://github.com/SlyEcho/llama.cpp/releases/tag/allblas-5472ad2 |
@SlyEcho please merge in master, I merged the cublas one :) |
@Green-Sky all green: https://github.com/SlyEcho/llama.cpp/actions/runs/4902013297 What about OpenBLAS, should I add it to all of them? |
nice. what do you mean by all of them? |
Currently it is only for the AVX2 standard build, but it's not impossible to add it to also AVX and AVX512. |
hm, let's not, for now. that sounds a bit like a combinatoric explosion, remember we still have linux and mac to do. |
Alright then, https://github.com/SlyEcho/llama.cpp/actions/runs/4902427214 is the latest CI job, the CUDA builds take ages. I think I'm happy to merge the branch as it is right now. (update: release: https://github.com/SlyEcho/llama.cpp/releases)
It would be one less, that what I have now, it would not have a separate OpenBLAS build, only AVX, AVX2, AVX512 and CLBlast and the OpenBLAS would be included by default. But I will continue experimenting with MingW, right now on Windows MSYS2 UCRT64 runtime, it seems a little better than the ancient stuff Ubuntu 22.04 has. Building OpenBLAS and CLBlast will take much less time than the CUDA install, actually, but they could be cached. It will be another PR maybe. |
Right. Changed It's running now: https://github.com/SlyEcho/llama.cpp/actions/runs/4906459042 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might need to add avx2 to all them default builds at some point
For CLBlast it downloads the OpenCL SDK release and CLBlast release. There is some minor fixup required to the .cmake files.
For OpenBLAS, download the release and use that, one file needs to be renamed because I think there is a mistake in LLAMA_OPENBLAS, but I found a workaround.
All these libraries are downloaded and extracted into
RUNNER_TEMP
. I don't know if it's the best thing, I'm not an expert on Actions, I think maybe we could use a cache here.They both also add the .dll file that is necessary to run into the artifact. I have a working Mingw cross compile from Debian figured out for CLBlast that will give a fully static .exe but it was easier to adapt the Windows job and it doesn't require compiling the OpenCL SDK and CLBlast.
The CL version can't be tested :( but the OpenBLAS one does pass tests.