-
Notifications
You must be signed in to change notification settings - Fork 13.3k
llama : support LiquidAI LFM2-MoE hybrid model #16464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+192
−15
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add support for [LiquidAI/LFM2-8B-A1B](https://huggingface.co/LiquidAI/LFM2-8B-A1B) model. For more information about models, please read [the blog post](https://www.liquid.ai/company/news). [HF PR](huggingface/transformers#41401) [GGUFs](https://huggingface.co/LiquidAI/LFM2-8B-A1B-GGUF)
I will remove upd: addressed in fe3b812 |
CISC
approved these changes
Oct 7, 2025
GGUFs are updated! Tested that |
anyshu
pushed a commit
to anyshu/llama.cpp
that referenced
this pull request
Oct 10, 2025
* master: (113 commits) webui: updated the chat service to only include max_tokens in the req… (ggml-org#16489) cpu : optimize the ggml NORM operation (ggml-org#15953) server : host-memory prompt caching (ggml-org#16391) No markdown in cot (ggml-org#16483) model-conversion : add support for SentenceTransformers (ggml-org#16387) ci: add ARM64 Kleidiai build and test support (ggml-org#16462) CANN: Improve ACL graph matching (ggml-org#16166) kleidiai: kernel interface refactoring (ggml-org#16460) [SYCL] refactor soft_max, add soft_max_back (ggml-org#16472) model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (ggml-org#16367) refactor: centralize CoT parsing in backend for streaming mode (ggml-org#16394) Disable CUDA host buffers on integrated GPUs (ggml-org#16308) server : fix cancel pending task (ggml-org#16467) metal : mark FA blocks (ggml-org#16372) server : improve context checkpoint logic (ggml-org#16440) ggml webgpu: profiling, CI updates, reworking of command submission (ggml-org#16452) llama : support LiquidAI LFM2-MoE hybrid model (ggml-org#16464) server : add `/v1/health` endpoint (ggml-org#16461) webui : added download action (ggml-org#13552) (ggml-org#16282) presets : fix pooling param for embedding models (ggml-org#16455) ...
boshjerns
added a commit
to boshjerns/llama.rn
that referenced
this pull request
Oct 11, 2025
Updates llama.cpp from b6638 to b6709, adding LFM2-MoE architecture support. Changes: - Updated third_party/llama.cpp submodule to b6709 - Synced cpp/ directory via scripts/bootstrap.sh - Added LLM_ARCH_LFM2MOE for LiquidAI hybrid models - Updated version.ts to build 6709 Tested with LiquidAI LFM2-1.2B models on iOS. References: - Release: https://github.com/ggml-org/llama.cpp/releases/tag/b6709 - PR: ggml-org/llama.cpp#16464
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add support for LiquidAI/LFM2-8B-A1B model.
For more information about the model, please read the blog post.
HF PR is merged.
GGUFs are uploaded and available for testing.