-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
April 2024 Binary Update #662
April 2024 Binary Update #662
Conversation
…aSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models.
- Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs.
…e in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle`
b24581d
to
869f389
Compare
It is not clear which llama.cpp version this is. Please update the llama.cpp submodule. |
@zsogitbe I've updated the submodule to |
Not sure what's wrong with OSX CI, I think it's just the normal OSX flakiness. |
@martindevans, basic test works on MacOS. |
Thank you Martin! It is a bit confusing because it is impossible to find a version called f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7 on llama.cpp. Without your link I cannot find it. Not sure what is happening here but I trust that it is a recent version that we have the last important updates of llama.cpp. |
|
I think that it would be better if you would use an official release from here: https://github.com/ggerganov/llama.cpp/releases. |
That's not specific GitHub behaviour, it's just the commit ID you'd use if you wanted to check out llama.cpp at the right version (i.e. Normally almost every commit in llama.cpp is associated with a "release" since the entire process is automated, we got unlucky with this one because their CI failed (looks like they have issues with flakey MacOS CI too) so the final release step got cancelled. Next time I'll have a look at the releases as well as the commits. I can always pick a slightly older commit, to line up with a valid release, if there isn't one for the latest commit at the time I started the work (normally I just take whatever is the very latest commit). |
Unit tests for CPU AVX2 and CUDA 12 both passed on my Windows 10 x64 system. |
Windows CUDA11 works fine for me. |
Updated binaries, using this build for llama.cpp commit
f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7
.SafeLlamaModelHandle
specific functions) intoSafeLlamaModelHandle.cs
.SafeLlamaModelHandle
andLLamaWeights
through aTokens
property. As new special tokens are added in the future they can be added here.DefaultSamplingPipeline
to handle no newline token in some models.all-MiniLM-L12-v2.Q8_0
). This model is tiny (<100MB) so it should speed up tests slightly.Testing: