build Docker images with AVX2 only #1334

bsilverthorn · 2023-05-05T18:14:22Z

Docker images produced by CI fail on (at least) my Alder Lake machine.

I believe the purpose of the Docker build is to provide an easy way to start generating tokens. Targeting AVX2-only seems like a reasonable compromise between performance and compatibility.

(This PR is pretty minimal. Maybe it would be better to have several different images targeting different microarchitectures. I still think this minimal change is an improvement, though.)

Docker builds produced by GitHub actions seem to fail on Alder Lake machines. The purpose of the Docker builds is to provide an easy way to start generating tokens. Targeting AVX2-only is a compromise between performance and compatibility.

slaren · 2023-05-05T19:45:24Z

An alternative would be building the binaries for the docker images with cmake.

Zahlii · 2023-05-19T07:50:36Z

I am also struggling with this, attempting to build on a MacBook M2 chip and run this on a Kubernetes Cluster. Even when building with --platform linux/amd64, the issue persists. When using a custom image on top of this PR, I was able to get it to start and load the model (but still facing other issues, not in scope of this ticket). As far as I understand, this wouldn't be solvable even if I use FORCE_CMAKE, as it would still pick up the M2 available instructions?

This is the problem I hit now with this repository:

2023-05-19 07:26:22 (109 MB/s) - ‘ggml-alpaca-7b-q4.bin’ saved [4212727017/4212727017]

 

llama.cpp: loading model from ggml-alpaca-7b-q4.bin
llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113748.20 KB
llama_model_load_internal: mem required = 5809.33 MB (+ 17592185987986.00 MB per state)

warning: failed to mlock 4212482048-byte buffer (after previously locking 0 bytes): Cannot allocate memory

Try increasing RLIMIT_MLOCK ('ulimit -l' as root).

...................................................................................................
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

bsilverthorn · 2023-08-23T20:04:41Z

This was a quick patch and I'm guessing it won't be merged, so I'm closing the PR.

bsilverthorn added 2 commits May 5, 2023 10:48

Makefile: support LLAMA_AVX2_ONLY.

2e3b2e7

Dockerfiles: use LLAMA_AVX2_ONLY.

e0ed30d

Docker builds produced by GitHub actions seem to fail on Alder Lake machines. The purpose of the Docker builds is to provide an easy way to start generating tokens. Targeting AVX2-only is a compromise between performance and compatibility.

bsilverthorn closed this Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build Docker images with AVX2 only #1334

build Docker images with AVX2 only #1334

bsilverthorn commented May 5, 2023 •

edited

Loading

slaren commented May 5, 2023

Zahlii commented May 19, 2023 •

edited

Loading

bsilverthorn commented Aug 23, 2023

build Docker images with AVX2 only #1334

build Docker images with AVX2 only #1334

Conversation

bsilverthorn commented May 5, 2023 • edited Loading

slaren commented May 5, 2023

Zahlii commented May 19, 2023 • edited Loading

bsilverthorn commented Aug 23, 2023

bsilverthorn commented May 5, 2023 •

edited

Loading

Zahlii commented May 19, 2023 •

edited

Loading