Skip to content

Conversation

tdakhran
Copy link
Contributor

@tdakhran tdakhran commented Oct 7, 2025

Add support for LiquidAI/LFM2-8B-A1B model.

For more information about the model, please read the blog post.
HF PR is merged.
GGUFs are uploaded and available for testing.

Add support for [LiquidAI/LFM2-8B-A1B](https://huggingface.co/LiquidAI/LFM2-8B-A1B) model.
For more information about models, please read [the blog post](https://www.liquid.ai/company/news).

[HF PR](huggingface/transformers#41401)
[GGUFs](https://huggingface.co/LiquidAI/LFM2-8B-A1B-GGUF)
@tdakhran tdakhran requested a review from CISC as a code owner October 7, 2025 14:04
@github-actions github-actions bot added the python python script changes label Oct 7, 2025
@tdakhran
Copy link
Contributor Author

tdakhran commented Oct 7, 2025

I will remove defaultdict, makes CI unhappy.

upd: addressed in fe3b812

@tdakhran
Copy link
Contributor Author

tdakhran commented Oct 7, 2025

Thanks you for the feedback @CISC , addressed in eb190c6.

I'm reuploading GGUFs.

@CISC CISC added hot Something that is hot model Model specific labels Oct 7, 2025
@CISC CISC merged commit aeaf8a3 into ggml-org:master Oct 7, 2025
72 checks passed
@tdakhran
Copy link
Contributor Author

tdakhran commented Oct 7, 2025

GGUFs are updated!

Tested that bin/llama-cli -hf LiquidAI/LFM2-8B-A1B-GGUF:Q4_0 -p "What's the capital of Great Britain?" works.

@tdakhran tdakhran deleted the tarek/feat/lfm2_moe branch October 7, 2025 18:08
anyshu pushed a commit to anyshu/llama.cpp that referenced this pull request Oct 10, 2025
* master: (113 commits)
  webui: updated the chat service to only include max_tokens in the req… (ggml-org#16489)
  cpu : optimize the ggml NORM operation (ggml-org#15953)
  server : host-memory prompt caching (ggml-org#16391)
  No markdown in cot (ggml-org#16483)
  model-conversion : add support for SentenceTransformers (ggml-org#16387)
  ci: add ARM64 Kleidiai build and test support (ggml-org#16462)
  CANN: Improve ACL graph matching (ggml-org#16166)
  kleidiai: kernel interface refactoring (ggml-org#16460)
  [SYCL] refactor soft_max, add soft_max_back (ggml-org#16472)
  model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (ggml-org#16367)
  refactor: centralize CoT parsing in backend for streaming mode (ggml-org#16394)
  Disable CUDA host buffers on integrated GPUs (ggml-org#16308)
  server : fix cancel pending task (ggml-org#16467)
  metal : mark FA blocks (ggml-org#16372)
  server : improve context checkpoint logic (ggml-org#16440)
  ggml webgpu: profiling, CI updates, reworking of command submission (ggml-org#16452)
  llama : support LiquidAI LFM2-MoE hybrid model (ggml-org#16464)
  server : add `/v1/health` endpoint (ggml-org#16461)
  webui : added download action (ggml-org#13552) (ggml-org#16282)
  presets : fix pooling param for embedding models (ggml-org#16455)
  ...
boshjerns added a commit to boshjerns/llama.rn that referenced this pull request Oct 11, 2025
Updates llama.cpp from b6638 to b6709, adding LFM2-MoE architecture support.

Changes:
- Updated third_party/llama.cpp submodule to b6709
- Synced cpp/ directory via scripts/bootstrap.sh
- Added LLM_ARCH_LFM2MOE for LiquidAI hybrid models
- Updated version.ts to build 6709

Tested with LiquidAI LFM2-1.2B models on iOS.

References:
- Release: https://github.com/ggml-org/llama.cpp/releases/tag/b6709
- PR: ggml-org/llama.cpp#16464
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hot Something that is hot model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants