-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker Issus ''Illegal instruction'' #537
Comments
I also get This is on Pop Os 22.04 with kernel 6.2.0-76060200 on a Ryzen 5 5600X, x86_64 with avx2, with gcc11. Some versions are fine though, we found that (we've been tracking this here too: serge-chat/serge#66) |
Illegal instruction sounds like using a instruction which your processor does not support. I've touched on the issue in this discussion: |
It's clear that as long as CPU features are determined at compile time, distributing binaries is going to cause problems like this. |
@slaren That explains why the binaries are so inconsistent. We don't know what CPU's the github runners are using, thus making the binaries un-usable |
Machine spec: |
@gaby yes they can vary, but for compilation it doesn't matter which cpu the runner has, only for tests. as you can see in the discussion how the windows builds always build avx512 but only test when its possible. if the docker builder looks at its own features when compiling the binaries, then its misconfigured. if i compile something for gameboy advance on my x86 pc, its not the features of my pc what i should choose when compiling. i'm not too familiar with docker but i suppose there has to be an option too which would not precompile binaries but rather have sources inside the container which would be compiled as the first step of installation. but idk, the whole raison d'être for docker containers is to deal with the huge mess of interconnected dependencies in the linux world which are hard to deal with. but this project doesn't contain any dependencies or libraries and can be simply built on any machine. so i don't understand the value proposition of docker when it comes to this project at all, except the negative value of constantly having to deal with issues related to it. if you are a absolute fan of docker and you just absolutely positively have to have it, the container could literally have a single .sh bash script which would do git clone https://github.com/ggerganov/llama.cpp.git
make and that's it lol. the beauty of having no libraries and dependencies. for precompiled binaries currently the only option is to build packages for different options like the windows releases. in the future a better option would be to detect the features at runtime though, unless it cant be done without a performance penalty but probably it can. it has to be researched a bit though because it would affect inlining which cannot be done when the codepaths arent static. if inlining achieves performance benefit then we gotta stick with the multi builds as speed > everything else. |
Whit Unraid i have two way to run it, whit a VM or whit a Docker, whit docker i can share the resource whit other process and whit a VM i ''lock'' the resource. This is for me the plus value of a Docker. |
@anzz1 Thanks for the insight. After several tries it seems that compiling llama.cpp as a first step during runtime is the solution. |
On a project with a million dependencies an libraries this might be a problem, but as there is no dependencies and builds on anything and thus compilation shouldn't pose a problem nor take more time than a few seconds. However in the post above there's an ongoing discussion about adding the ability of checking processor features at runtime. There is some work however in accurately testing and analyzing that it can be done without hurting performance, so it's currently on the backlog under more important issues. When it's properly researched that it can be done without degrading performance, the part of adding it isn't hard at all. Just have to be sure to not introduce a regression while doing it. |
@Netsuno have u succeed ? I got UNRAID too but dont succeed to run it on Docker |
@Taillan I have make my own docker image to run it. But my Unraid server is not powerfull (2x xeon 2670 v2) so i have stop my idea to make it work on Unraid for now (it take 100% of my power for 2 minute to generate 1 answer) |
A case for runtime detection: In any reasonable, modern cloud deployment, llama.cpp would end up inside a container. In fact, being CPU-only, llama enables deploying your ML inference to something like AWS Lambda/GCP Cloud Run providing very simple, huge scalability for inference. All these systems use containerization and expect you to have pre-built binaries ready to go. Compiling at container launch is not really an option as that significantly increases cold-start/scale up latencies (a few seconds is too long). However, the higher up the serverless stack you go, the less control you have over the CPU platform underneath. GCP, for example, has machines from Haswell era ++ all intermingled in and they don’t even document what to expect for Cloud Functions or Cloud Run. I’m not a C expert by any means so not my wheelhouse to offer up a PR but the case for this is pretty strong IMO. |
Got the same error:
when I executed command:
my environment is :
Does anyone can help? |
I'm afraid that you'll have to rebuild the image locally and use that instead. But that isn't very complicated. |
Yeah, the modern cloud environment where in many cases you have less control over and knowledge about the underlying hardware than what used to be is unfortunate, but it is reality. You definitely do not want to go all-out runtime detection in a performance-driven application like this and lose the compiler optimizations allowed by compiler-time detection with simple Something like this: inline unsigned int ggml_cpu_has_avx512(void) {
#if defined(CPUDETECT_RUNTIME)
static unsigned const char a[] = {0x53,0x31,0xC9,0xB8,0x07,0x00,0x00,0x00,0x0F,0xA2,0xC1,0xEB,0x10,0x83,0xE3,0x01,0x89,0xD8,0x5B,0xC3};
return ((unsigned int (__cdecl *)(void)) (void*)((void*)a))();
#elif defined(__AVX512F__)
return 1;
#else
return 0;
#endif
} Then replacing any edit: To be clear, using bytecode in the example above isn't being obtuse for the sake of being obtuse in some misguided attempt of trying to look smart or something. Optimally you'd use Here's a list of some of the (x86) processor feature checks in bytecode: cpuid.h Rest can be found in the x86 documentation: Intel ® Architecture Instruction Set Extensions and Future Features "Chapter 1.5 CPUID Instruction" AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions "Appendix D: Instruction Subsets and CPUID Feature Flags" |
Easiest solution, imho, is to provide multiple versions of the container image. It doesn't have to cover all architectures, and it doesn't have to be every release. But setup which covers >=85% of consumers at any given time is enough. The rest can rebuild. |
Sure, the easiest solution would be to create a container image for every configuration set, and it could be easily automated with github actions. It's a solution, but not a good one, as it's not future proof and carries a risk of getting stuck with a bad practice. |
I didn't say every configuration set. Unless the runtime detection has only negligible impact on performance I think it's better for consumer to just get image optimized for their architecture. Obviously there is a point of diminishing returns. But even Intel provides optimized images for avx512 [0]. [0] https://hub.docker.com/r/intel/intel-optimized-tensorflow-avx512 |
Sure, you could do automatic separate images for AVX / AVX2 / AVX512 like the Windows releases by just editing the action, no code change necessary. Or as the binaries are rather small, you could pack each of them in one image and add a simple script to have "launch-time" detection if you will, something like this: #!/bin/sh
cpuinfo="$(cat /proc/cpuinfo)"
if [ $(echo "$cpuinfo" | grep -c avx512) -gt 0 ]; then
./llama_avx512 "$@"
elif [ $(echo "$cpuinfo" | grep -c avx2) -gt 0 ]; then
./llama_avx2 "$@"
else
./llama_avx "$@"
fi |
If your model is already quantized, this did the trick for me, using light:
|
I have a similar issue in docker on some machines. I'm using After an However I have it in gdb
strace Expand strace output
|
cuda-supported docker image works like a charm and fairly quick. but then the 1 out of 10 machines you deploy to crashes with this 'illegal instruction' error. the issue also is often reported across adaptations: ollama/ollama#2187 I'm not too familiar with these instructions, but is it not feasible to have one workflow that builds one docker image that you can deploy reliably? it just works so well and not having one docker image is a bit of a shame. |
try |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
I try to make it run the docker version on Unraid,
I run this as post Arguments:
--run -m /models/7B/ggml-model-q4_0.bin -p "This is a test" -n 512
I got this error:
/app/.devops/tools.sh: line 40: 7 Illegal instruction ./main $arg2
Log:
I have run this whitout any issus:
--all-in-one "/models/" 7B
The text was updated successfully, but these errors were encountered: