Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

April 2024 Binary Update #662

Merged
merged 7 commits into from
Apr 16, 2024

Conversation

martindevans
Copy link
Member

@martindevans martindevans commented Apr 12, 2024

Updated binaries, using this build for llama.cpp commit f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7.

  • Added all new functions.
  • Moved some functions (e.g. SafeLlamaModelHandle specific functions) into SafeLlamaModelHandle.cs.
  • Exposed tokens on SafeLlamaModelHandle and LLamaWeights through a Tokens property. As new special tokens are added in the future they can be added here.
  • Changed all token properties to return nullable tokens, to handle some models not having some tokens.
  • Fixed DefaultSamplingPipeline to handle no newline token in some models.
  • Switched embeddings tests to use an embedding model (all-MiniLM-L12-v2.Q8_0). This model is tiny (<100MB) so it should speed up tests slightly.
  • Added higher level methods for saving/loading sequence state

Testing:

  • Windows (CPU)
  • Windows (CUDA 11)
  • Windows (CUDA 12)
  • Windows (OpenCL)
  • Linux (CPU)
  • Linux (CUDA 11)
  • Linux (CUDA 12)
  • Linux (OpenCL)
  • MacOS

…aSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`.

 - Added all new functions.
 - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs`
 - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here.
 - Changed all token properties to return nullable tokens, to handle some models not having some tokens.
 - Fixed `DefaultSamplingPipeline` to handle no newline token in some models.
 - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already.
 - Checking that GPU layer count is zero if GPU offload is not supported.
 - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs.
…e in `SafeLLamaContextHandle`

 - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext`
 - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle`
@martindevans martindevans force-pushed the april_2024_binary_update branch from b24581d to 869f389 Compare April 13, 2024 01:44
@zsogitbe
Copy link
Contributor

It is not clear which llama.cpp version this is. Please update the llama.cpp submodule.

@martindevans
Copy link
Member Author

@zsogitbe I've updated the submodule to f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7 which is the version this is based on.

@martindevans
Copy link
Member Author

Not sure what's wrong with OSX CI, I think it's just the normal OSX flakiness.

@SignalRT
Copy link
Collaborator

@martindevans, basic test works on MacOS.

@zsogitbe
Copy link
Contributor

Thank you Martin! It is a bit confusing because it is impossible to find a version called f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7 on llama.cpp. Without your link I cannot find it. Not sure what is happening here but I trust that it is a recent version that we have the last important updates of llama.cpp.

@martindevans
Copy link
Member Author

f7001ccc... is the commit id, so it's this.

@zsogitbe
Copy link
Contributor

I think that it would be better if you would use an official release from here: https://github.com/ggerganov/llama.cpp/releases.
Your f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7 cannot be found in these releases, but I understand that your link points to it. I do not feel comfortable with this GitHub behavior.

@martindevans
Copy link
Member Author

That's not specific GitHub behaviour, it's just the commit ID you'd use if you wanted to check out llama.cpp at the right version (i.e. git checkout f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7).

Normally almost every commit in llama.cpp is associated with a "release" since the entire process is automated, we got unlucky with this one because their CI failed (looks like they have issues with flakey MacOS CI too) so the final release step got cancelled.

Next time I'll have a look at the releases as well as the commits. I can always pick a slightly older commit, to line up with a valid release, if there isn't one for the latest commit at the time I started the work (normally I just take whatever is the very latest commit).

@m0nsky
Copy link
Contributor

m0nsky commented Apr 14, 2024

Unit tests for CPU AVX2 and CUDA 12 both passed on my Windows 10 x64 system.

@Lyrcaxis
Copy link
Contributor

Lyrcaxis commented Apr 15, 2024

I didn't run the unit tests, but inference examples work fine in Linux (CPU) + Linux (CUDA 12)
image

@AsakusaRinne
Copy link
Collaborator

Windows CUDA11 works fine for me.

@martindevans martindevans merged commit c325ac9 into SciSharp:master Apr 16, 2024
2 checks passed
@martindevans martindevans deleted the april_2024_binary_update branch April 16, 2024 22:19
@zsogitbe
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants