Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build Docker images with AVX2 only #1334

Closed
wants to merge 2 commits into from
Closed

build Docker images with AVX2 only #1334

wants to merge 2 commits into from

Conversation

bsilverthorn
Copy link

@bsilverthorn bsilverthorn commented May 5, 2023

Docker images produced by CI fail on (at least) my Alder Lake machine.

I believe the purpose of the Docker build is to provide an easy way to start generating tokens. Targeting AVX2-only seems like a reasonable compromise between performance and compatibility.

(This PR is pretty minimal. Maybe it would be better to have several different images targeting different microarchitectures. I still think this minimal change is an improvement, though.)

Related:

Docker builds produced by GitHub actions seem to fail on Alder Lake
machines.

The purpose of the Docker builds is to provide an easy way to start
generating tokens. Targeting AVX2-only is a compromise between performance
and compatibility.
@slaren
Copy link
Collaborator

slaren commented May 5, 2023

An alternative would be building the binaries for the docker images with cmake.

@Zahlii
Copy link

Zahlii commented May 19, 2023

I am also struggling with this, attempting to build on a MacBook M2 chip and run this on a Kubernetes Cluster. Even when building with --platform linux/amd64, the issue persists. When using a custom image on top of this PR, I was able to get it to start and load the model (but still facing other issues, not in scope of this ticket). As far as I understand, this wouldn't be solvable even if I use FORCE_CMAKE, as it would still pick up the M2 available instructions?

This is the problem I hit now with this repository:

2023-05-19 07:26:22 (109 MB/s) - ‘ggml-alpaca-7b-q4.bin’ saved [4212727017/4212727017]

 

llama.cpp: loading model from ggml-alpaca-7b-q4.bin
llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113748.20 KB
llama_model_load_internal: mem required = 5809.33 MB (+ 17592185987986.00 MB per state)

warning: failed to mlock 4212482048-byte buffer (after previously locking 0 bytes): Cannot allocate memory

Try increasing RLIMIT_MLOCK ('ulimit -l' as root).

...................................................................................................
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

@bsilverthorn
Copy link
Author

This was a quick patch and I'm guessing it won't be merged, so I'm closing the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants