Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Makefile to support MOE #1446

Closed
wants to merge 1 commit into from
Closed

Conversation

sfxworks
Copy link
Contributor

In reference to ggerganov/llama.cpp#4406

Need a newer version of llama.cpp to handle MoE models, such as Mixtral 8x7b

Description

This PR fixes #1421

Notes for Reviewers

Signed commits

  • Yes, I signed my commits.

In reference to ggerganov/llama.cpp#4406

Need a newer version of llama.cpp to handle MoE models, such as Mixtral 8x7b

Signed-off-by: Samuel Walker <sfxworks@gmail.com>
Copy link

netlify bot commented Dec 15, 2023

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 76abeee
🔍 Latest deploy log https://app.netlify.com/sites/localai/deploys/657c67b46167290008f7df6f
😎 Deploy Preview https://deploy-preview-1446--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@sfxworks sfxworks marked this pull request as draft December 15, 2023 14:50
@sfxworks
Copy link
Contributor Author

Currently building and testing locally to confirm it works.

Using

#backend: llama
context_size: 8192
f16: true
low_vram: false
gpu_layers: 98
mmlock: false
name: mixtral
parameters:
  model: mixtral-8x7b-v0.1.Q4_K_M.gguf
  temperature: 0.2

@sfxworks
Copy link
Contributor Author

I local-ai build info:
I BUILD_TYPE: hipblas
I GO_TAGS: 
I LD_FLAGS: -X "github.com/go-skynet/LocalAI/internal.Version=v2.0.0" -X "github.com/go-skynet/LocalAI/internal.Commit=238fec244ae6c9a66bc7fafd76c7e14671110a6f"
CGO_LDFLAGS="-L/opt/rocm/hip/lib -lamdhip64 -L/opt/rocm/lib -lOpenCL -L/usr/lib -lclblast -lrocblas -lhipblas -lrocrand -lomp -O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link -O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link" go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=v2.0.0" -X "github.com/go-skynet/LocalAI/internal.Commit=238fec244ae6c9a66bc7fafd76c7e14671110a6f"" -tags "" -o local-ai ./
10:02AM DBG GRPC(mixtral-8x7b-v0.1.Q4_K_M.gguf-127.0.0.1:42533): stderr error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found

Hmm, that doesn't appear to be where to change it.

@sfxworks
Copy link
Contributor Author

Maybe gollama needs to be updated as well?

@sfxworks
Copy link
Contributor Author

Tried with

GOLLAMA_VERSION?=77e691050c5401f03240f1960410e286fb50e8e2
CPPLLAMA_VERSION?=cafcd4f89500b8afef722cdb08088eceb8a22572

To reflect upstream update attempt go-skynet/go-llama.cpp#313 but failed

cd llama.cpp && patch -p1 < ../patches/1902-cuda.patch
patching file common/common.cpp
Hunk #1 succeeded at 1614 with fuzz 2 (offset 346 lines).
patching file common/common.h
Hunk #1 FAILED at 209.
1 out of 1 hunk FAILED -- saving rejects to file common/common.h.rej
make[1]: *** [Makefile:235: prepare] Error 1
make[1]: Leaving directory '/home/sam/LocalAI/sources/go-llama'
make: *** [Makefile:223: sources/go-llama/libbinding.a] Error 2

@sfxworks
Copy link
Contributor Author

@sfxworks
Copy link
Contributor Author

Trying against go-skynet/go-llama.cpp#315

@sfxworks
Copy link
Contributor Author

sfxworks commented Dec 15, 2023

hmm

11:07AM DBG GRPC(mixtral-8x7b-v0.1.Q4_K_M.gguf-127.0.0.1:44575): stderr error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found

@mudler
Copy link
Owner

mudler commented Dec 15, 2023

did you tried with the llama-cpp backend? It should also be the default

@sfxworks
Copy link
Contributor Author

did you tried with the llama-cpp backend? It should also be the default

I tried both with backend: llama commented and uncomment. No success. The last comment was with it uncommented.

@mudler
Copy link
Owner

mudler commented Dec 15, 2023

did you tried without acceleration too?

@sfxworks
Copy link
Contributor Author

Didn't try this model without acceleration, but I did try another model with acceleration and it worked just fine.

@mudler
Copy link
Owner

mudler commented Dec 16, 2023

maybe it is an upstream issue, the llama-cpp backend is the most close to upstream one, if that fails something might be off with llama.cpp. master just got latest hash in #1429, I'll try to give it a go later today too

@sfxworks
Copy link
Contributor Author

I'll give that a try today

@mudler
Copy link
Owner

mudler commented Dec 16, 2023

Tried today and works locally, adding a full example in #1449

@mudler
Copy link
Owner

mudler commented Dec 18, 2023

@sfxworks appreciate the effort here, but I think we can close this one as we have more up-to-date hashes in master, or is there anything pending? did you tried if mixtral works for you?

@sfxworks
Copy link
Contributor Author

Yep! All works I appreciate it!

@sfxworks sfxworks closed this Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mixtral support ?
2 participants