Docker Issus ''Illegal instruction'' #537

Netsuno · 2023-03-26T19:18:11Z

I try to make it run the docker version on Unraid,

I run this as post Arguments:
--run -m /models/7B/ggml-model-q4_0.bin -p "This is a test" -n 512

I got this error: /app/.devops/tools.sh: line 40: 7 Illegal instruction ./main $arg2

Log:

main: seed = 1679843913
llama_model_load: loading model from '/models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml ctx size = 4273.34 MB
llama_model_load: mem required  = 6065.34 MB (+ 1026.00 MB per state)
/app/.devops/tools.sh: line 40:     7 Illegal instruction     ./main $arg2

I have run this whitout any issus: --all-in-one "/models/" 7B

The text was updated successfully, but these errors were encountered:

nsarrazin · 2023-03-26T20:01:14Z

I also get Illegal instruction (core dumped) when using the docker image, while compiling from source seems to solve the issue.

This is on Pop Os 22.04 with kernel 6.2.0-76060200 on a Ryzen 5 5600X, x86_64 with avx2, with gcc11.

Some versions are fine though, we found that light-19726169b379bebc96189673a19b89ab1d307659 doesn't seem to have this problem but light-34c1072e497eb92d81ee7c0e12aa6741496a41c6 does ?

(we've been tracking this here too: serge-chat/serge#66)

anzz1 · 2023-03-26T20:05:13Z

Illegal instruction sounds like using a instruction which your processor does not support. I've touched on the issue in this discussion:

Regarding detection and use of processor feature sets #535

slaren · 2023-03-26T20:06:38Z

It's clear that as long as CPU features are determined at compile time, distributing binaries is going to cause problems like this.

gaby · 2023-03-26T20:47:51Z

@slaren That explains why the binaries are so inconsistent. We don't know what CPU's the github runners are using, thus making the binaries un-usable

Netsuno · 2023-03-26T21:44:24Z

Machine spec:
Motherbord: Supermicro X9DRi-LN4+, Version REV:1.20A
CPU: Dual Xeon E5-2670 v2
Ram: 128G DDR3 LRDIMM

anzz1 · 2023-03-26T23:08:28Z

@gaby yes they can vary, but for compilation it doesn't matter which cpu the runner has, only for tests. as you can see in the discussion how the windows builds always build avx512 but only test when its possible. if the docker builder looks at its own features when compiling the binaries, then its misconfigured. if i compile something for gameboy advance on my x86 pc, its not the features of my pc what i should choose when compiling. i'm not too familiar with docker but i suppose there has to be an option too which would not precompile binaries but rather have sources inside the container which would be compiled as the first step of installation.

but idk, the whole raison d'être for docker containers is to deal with the huge mess of interconnected dependencies in the linux world which are hard to deal with. but this project doesn't contain any dependencies or libraries and can be simply built on any machine. so i don't understand the value proposition of docker when it comes to this project at all, except the negative value of constantly having to deal with issues related to it.

if you are a absolute fan of docker and you just absolutely positively have to have it, the container could literally have a single .sh bash script which would do

git clone https://github.com/ggerganov/llama.cpp.git
make

and that's it lol. the beauty of having no libraries and dependencies.

for precompiled binaries currently the only option is to build packages for different options like the windows releases. in the future a better option would be to detect the features at runtime though, unless it cant be done without a performance penalty but probably it can. it has to be researched a bit though because it would affect inlining which cannot be done when the codepaths arent static. if inlining achieves performance benefit then we gotta stick with the multi builds as speed > everything else.

Netsuno · 2023-03-27T15:29:49Z

but idk, the whole raison d'être for docker containers is to deal with the huge mess of interconnected dependencies in the linux world which are hard to deal with. but this project doesn't contain any dependencies or libraries and can be simply built on any machine. so i don't understand the value proposition of docker when it comes to this project at all, except the negative value of constantly having to deal with issues related to it.

Whit Unraid i have two way to run it, whit a VM or whit a Docker, whit docker i can share the resource whit other process and whit a VM i ''lock'' the resource. This is for me the plus value of a Docker.

gaby · 2023-03-28T04:32:56Z

@anzz1 Thanks for the insight. After several tries it seems that compiling llama.cpp as a first step during runtime is the solution.

anzz1 · 2023-03-29T12:41:31Z

@anzz1 Thanks for the insight. After several tries it seems that compiling llama.cpp as a first step during runtime is the solution.

On a project with a million dependencies an libraries this might be a problem, but as there is no dependencies and builds on anything and thus compilation shouldn't pose a problem nor take more time than a few seconds. However in the post above there's an ongoing discussion about adding the ability of checking processor features at runtime. There is some work however in accurately testing and analyzing that it can be done without hurting performance, so it's currently on the backlog under more important issues.

When it's properly researched that it can be done without degrading performance, the part of adding it isn't hard at all. Just have to be sure to not introduce a regression while doing it.

Taillan · 2023-03-30T14:40:21Z

@Netsuno have u succeed ? I got UNRAID too but dont succeed to run it on Docker

Netsuno · 2023-03-30T14:47:13Z

@Taillan I have make my own docker image to run it. But my Unraid server is not powerfull (2x xeon 2670 v2) so i have stop my idea to make it work on Unraid for now (it take 100% of my power for 2 minute to generate 1 answer)

kiratp · 2023-04-30T23:58:02Z

On a project with a million dependencies an libraries this might be a problem, but as there is no dependencies and builds on anything and thus compilation shouldn't pose a problem nor take more time than a few seconds

A case for runtime detection:

In any reasonable, modern cloud deployment, llama.cpp would end up inside a container. In fact, being CPU-only, llama enables deploying your ML inference to something like AWS Lambda/GCP Cloud Run providing very simple, huge scalability for inference. All these systems use containerization and expect you to have pre-built binaries ready to go. Compiling at container launch is not really an option as that significantly increases cold-start/scale up latencies (a few seconds is too long).

However, the higher up the serverless stack you go, the less control you have over the CPU platform underneath. GCP, for example, has machines from Haswell era ++ all intermingled in and they don’t even document what to expect for Cloud Functions or Cloud Run.

I’m not a C expert by any means so not my wheelhouse to offer up a PR but the case for this is pretty strong IMO.

JerryYao80 · 2023-05-29T01:12:41Z

Got the same error:

ERROR: /app/.devops/tools.sh: line 40 6 Illegal instruction ./main $arg2

when I executed command:

docker run -v /models/llama7b:/home ghcr.io/ggerganov/llama.cpp:full --run -m /home/ggml-model-q4_1.bin -p "hello" -n 512

my environment is :

Docker Toolbox 1.13.1
docker client: 1.13.1 os/arch: windows 7 /amd64
docker server:19.03.12 os/arch:ubuntu 22.04 /amd64

Does anyone can help?

jpodivin · 2023-07-16T13:48:31Z

Got the same error:

ERROR: /app/.devops/tools.sh: line 40 6 Illegal instruction ./main $arg2

when I executed command:

docker run -v /models/llama7b:/home ghcr.io/ggerganov/llama.cpp:full --run -m /home/ggml-model-q4_1.bin -p "hello" -n 512

my environment is :

Docker Toolbox 1.13.1
docker client: 1.13.1 os/arch: windows 7 /amd64
docker server:19.03.12 os/arch:ubuntu 22.04 /amd64

Does anyone can help?

I'm afraid that you'll have to rebuild the image locally and use that instead. But that isn't very complicated.

anzz1 · 2023-07-16T15:46:10Z

On a project with a million dependencies an libraries this might be a problem, but as there is no dependencies and builds on anything and thus compilation shouldn't pose a problem nor take more time than a few seconds

A case for runtime detection:

In any reasonable, modern cloud deployment, llama.cpp would end up inside a container. In fact, being CPU-only, llama enables deploying your ML inference to something like AWS Lambda/GCP Cloud Run providing very simple, huge scalability for inference. All these systems use containerization and expect you to have pre-built binaries ready to go. Compiling at container launch is not really an option as that significantly increases cold-start/scale up latencies (a few seconds is too long).

However, the higher up the serverless stack you go, the less control you have over the CPU platform underneath. GCP, for example, has machines from Haswell era ++ all intermingled in and they don’t even document what to expect for Cloud Functions or Cloud Run.

I’m not a C expert by any means so not my wheelhouse to offer up a PR but the case for this is pretty strong IMO.

Yeah, the modern cloud environment where in many cases you have less control over and knowledge about the underlying hardware than what used to be is unfortunate, but it is reality.

You definitely do not want to go all-out runtime detection in a performance-driven application like this and lose the compiler optimizations allowed by compiler-time detection with simple #ifdef's, hurting everyone else in the process for the sake of cloud and containerization but there is a case for having it both ways.

Something like this:

inline unsigned int ggml_cpu_has_avx512(void) {
#if defined(CPUDETECT_RUNTIME)
  static unsigned const char a[] = {0x53,0x31,0xC9,0xB8,0x07,0x00,0x00,0x00,0x0F,0xA2,0xC1,0xEB,0x10,0x83,0xE3,0x01,0x89,0xD8,0x5B,0xC3};
  return ((unsigned int (__cdecl *)(void)) (void*)((void*)a))();
#elif defined(__AVX512F__)
    return 1;
#else
    return 0;
#endif
}

Then replacing any #ifdef __AVX512F__ with ggml_cpu_has_avx512() allows for runtime detection when configured as such and when not, the compiler should optimize it away and have the same end result as #ifdef that is not messing up its' optimization logic. However compilers can be finicky sometimes so it's definitely prudent to check with a disassembler that the end result really is the same.

edit: To be clear, using bytecode in the example above isn't being obtuse for the sake of being obtuse in some misguided attempt of trying to look smart or something. Optimally you'd use __asm { }, but the reason why you can't is that contrary to every other compiler out there, MSVC decided to drop inline assembly support for the 64-bit era, a decision made to be a bane of low-level coders existences' ever since. Bytecode is the only thing that works for every compiler. If you want to see what's going on, you can copy paste the bytecode above to https://disasm.czbix.com/ for example.

Here's a list of some of the (x86) processor feature checks in bytecode: cpuid.h

Rest can be found in the x86 documentation:

Intel ® Architecture Instruction Set Extensions and Future Features "Chapter 1.5 CPUID Instruction"

AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions "Appendix D: Instruction Subsets and CPUID Feature Flags"

jpodivin · 2023-07-16T20:19:39Z

Easiest solution, imho, is to provide multiple versions of the container image. It doesn't have to cover all architectures, and it doesn't have to be every release. But setup which covers >=85% of consumers at any given time is enough.

The rest can rebuild.

anzz1 · 2023-07-17T00:41:48Z

Easiest solution, imho, is to provide multiple versions of the container image. It doesn't have to cover all architectures, and it doesn't have to be every release. But setup which covers >=85% of consumers at any given time is enough.

The rest can rebuild.

Sure, the easiest solution would be to create a container image for every configuration set, and it could be easily automated with github actions. It's a solution, but not a good one, as it's not future proof and carries a risk of getting stuck with a bad practice.
You know what they say, Nothing is more permanent than a temporary solution 😄

jpodivin · 2023-07-17T18:46:54Z

I didn't say every configuration set. Unless the runtime detection has only negligible impact on performance I think it's better for consumer to just get image optimized for their architecture. Obviously there is a point of diminishing returns. But even Intel provides optimized images for avx512 [0].

[0] https://hub.docker.com/r/intel/intel-optimized-tensorflow-avx512

anzz1 · 2023-07-19T20:47:13Z

I didn't say every configuration set. Unless the runtime detection has only negligible impact on performance I think it's better for consumer to just get image optimized for their architecture. Obviously there is a point of diminishing returns. But even Intel provides optimized images for avx512 [0].

[0] hub.docker.com/r/intel/intel-optimized-tensorflow-avx512

Sure, you could do automatic separate images for AVX / AVX2 / AVX512 like the Windows releases by just editing the action, no code change necessary.

Or as the binaries are rather small, you could pack each of them in one image and add a simple script to have "launch-time" detection if you will, something like this:

#!/bin/sh

cpuinfo="$(cat /proc/cpuinfo)"
if [ $(echo "$cpuinfo" | grep -c avx512) -gt 0 ]; then
	./llama_avx512 "$@"
elif [ $(echo "$cpuinfo" | grep -c avx2) -gt 0 ]; then
	./llama_avx2 "$@"
else
	./llama_avx "$@"
fi

athrael-soju · 2023-08-19T22:56:47Z

If your model is already quantized, this did the trick for me, using light:

docker run -v /E/Projects/llama.cpp/models:/models ghcr.io/ggerganov/llama.cpp:light -m models/7B/llama-2-7b-chat.ggmlv3.q4_0.bin -p "hello" -n 512

lapp0 · 2023-11-19T21:04:11Z

I have a similar issue in docker on some machines. I'm using local/llama.cpp:full-cuda

After an strace, it turned out /server couldn't find libcublas.so.11.

However I have it in /usr/local/cuda-11.7/targets/x86_64-linux/lib/libcublas.so.11 Perhaps something wrong with the way I built, still investigating.

gdb

Starting program: /app/server 
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x0000559816d04ef6 in gpt_params::gpt_params() ()

strace

Expand strace output

execve("/app/server", ["/app/server"], 0x7ffc7919b310 /* 59 vars */) = 0
brk(NULL)                               = 0x55db38d96000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffe3a3cabe0) = -1 EINVAL (Invalid argument)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6dc42000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/glibc-hwcaps/x86-64-v3/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/glibc-hwcaps/x86-64-v3", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/glibc-hwcaps/x86-64-v2/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/glibc-hwcaps/x86-64-v2", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/tls/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/tls", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/glibc-hwcaps/x86-64-v3/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/glibc-hwcaps/x86-64-v3", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/glibc-hwcaps/x86-64-v2/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/glibc-hwcaps/x86-64-v2", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/tls", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=19787, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 19787, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ffb6dc3d000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.11", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\300\v\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=151346592, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 155573560, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb647df000
mmap(0x7ffb64800000, 153476408, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x7ffb64800000
munmap(0x7ffb647df000, 135168)          = 0
munmap(0x7ffb6da5e000, 1961272)         = 0
mprotect(0x7ffb6d80e000, 2097152, PROT_NONE) = 0
mmap(0x7ffb6da0e000, 294912, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x900e000) = 0x7ffb6da0e000
mmap(0x7ffb6da56000, 32056, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb6da56000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200\360\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=671072, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 4869864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6435b000
mmap(0x7ffb64400000, 2772712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x7ffb64400000
munmap(0x7ffb6435b000, 675840)          = 0
munmap(0x7ffb646a5000, 1421032)         = 0
mprotect(0x7ffb6449e000, 2097152, PROT_NONE) = 0
mmap(0x7ffb6469e000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9e000) = 0x7ffb6469e000
mmap(0x7ffb646a4000, 3816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb646a4000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=2260296, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 2275520, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb641d4000
mprotect(0x7ffb6426e000, 1576960, PROT_NONE) = 0
mmap(0x7ffb6426e000, 1118208, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9a000) = 0x7ffb6426e000
mmap(0x7ffb6437f000, 454656, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ab000) = 0x7ffb6437f000
mmap(0x7ffb643ef000, 57344, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21a000) = 0x7ffb643ef000
mmap(0x7ffb643fd000, 10432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb643fd000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=940560, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 942344, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db56000
mmap(0x7ffb6db64000, 507904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe000) = 0x7ffb6db64000
mmap(0x7ffb6dbe0000, 372736, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8a000) = 0x7ffb6dbe0000
mmap(0x7ffb6dc3b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe4000) = 0x7ffb6dc3b000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=125488, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 127720, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db36000
mmap(0x7ffb6db39000, 94208, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7ffb6db39000
mmap(0x7ffb6db50000, 16384, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a000) = 0x7ffb6db50000
mmap(0x7ffb6db54000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d000) = 0x7ffb6db54000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\237\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3, "\4\0\0\0 \0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0"..., 48, 848) = 48
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\244;\374\204(\337f#\315I\214\234\f\256\271\32"..., 68, 896) = 68
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=2216304, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6db34000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
mmap(NULL, 2260560, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb63fac000
mmap(0x7ffb63fd4000, 1658880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7ffb63fd4000
mmap(0x7ffb64169000, 360448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7ffb64169000
mmap(0x7ffb641c1000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x214000) = 0x7ffb641c1000
mmap(0x7ffb641c7000, 52816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb641c7000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/glibc-hwcaps/x86-64-v3/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/glibc-hwcaps/x86-64-v3", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/glibc-hwcaps/x86-64-v2/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/glibc-hwcaps/x86-64-v2", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20 ,\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=332762424, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 337251104, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb4fe0b000
mmap(0x7ffb50000000, 335153952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x7ffb50000000
munmap(0x7ffb4fe0b000, 2052096)         = 0
munmap(0x7ffb63fa1000, 43808)           = 0
mprotect(0x7ffb6144b000, 2097152, PROT_NONE) = 0
mmap(0x7ffb6164b000, 43048960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1144b000) = 0x7ffb6164b000
mmap(0x7ffb63f59000, 293664, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb63f59000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/librt.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=14664, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 16440, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db2f000
mmap(0x7ffb6db30000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7ffb6db30000
mmap(0x7ffb6db31000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db31000
mmap(0x7ffb6db32000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db32000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=21448, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 16424, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db2a000
mmap(0x7ffb6db2b000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7ffb6db2b000
mmap(0x7ffb6db2c000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db2c000
mmap(0x7ffb6db2d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db2d000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=14432, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 16424, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db25000
mmap(0x7ffb6db26000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7ffb6db26000
mmap(0x7ffb6db27000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db27000
mmap(0x7ffb6db28000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db28000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6db23000
mmap(NULL, 40960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6db19000
arch_prctl(ARCH_SET_FS, 0x7ffb6db20000) = 0
set_tid_address(0x7ffb6db202d0)         = 2375
set_robust_list(0x7ffb6db202e0, 24)     = 0
rseq(0x7ffb6db209a0, 0x20, 0, 0x53053053) = 0
mprotect(0x7ffb641c1000, 16384, PROT_READ) = 0
mprotect(0x7ffb6db28000, 4096, PROT_READ) = 0
mprotect(0x7ffb6db2d000, 4096, PROT_READ) = 0
mprotect(0x7ffb6db32000, 4096, PROT_READ) = 0
mprotect(0x7ffb6dc3b000, 4096, PROT_READ) = 0
mprotect(0x7ffb6db54000, 4096, PROT_READ) = 0
mprotect(0x7ffb6164b000, 1626112, PROT_READ) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6db17000
mprotect(0x7ffb643ef000, 45056, PROT_READ) = 0
mprotect(0x7ffb6469e000, 20480, PROT_READ) = 0
mprotect(0x7ffb6da0e000, 57344, PROT_READ) = 0
mprotect(0x55db37bed000, 12288, PROT_READ) = 0
mprotect(0x7ffb6dc7c000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x7ffb6dc3d000, 19787)           = 0
getrandom("\x09\x04\xe0\x7c\x1d\xb7\x5a\x62", 8, GRND_NONBLOCK) = 8
brk(NULL)                               = 0x55db38d96000
brk(0x55db38db7000)                     = 0x55db38db7000
futex(0x7ffb63f9d00c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x55db38dd8000)                     = 0x55db38dd8000
brk(0x55db38df9000)                     = 0x55db38df9000
brk(0x55db38e1a000)                     = 0x55db38e1a000
futex(0x7ffb63f9de44, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7ffb63f9de50, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x55db38e3b000)                     = 0x55db38e3b000
brk(0x55db38e5c000)                     = 0x55db38e5c000
brk(0x55db38e7d000)                     = 0x55db38e7d000
brk(0x55db38e9e000)                     = 0x55db38e9e000
brk(0x55db38ebf000)                     = 0x55db38ebf000
brk(0x55db38ee0000)                     = 0x55db38ee0000
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=19787, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 19787, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ffb6dc3d000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libcuda.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\r\16\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=29196368, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 29614656, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb4e3c1000
mprotect(0x7ffb4e4a1000, 27111424, PROT_NONE) = 0
mmap(0x7ffb4e4a1000, 5242880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe0000) = 0x7ffb4e4a1000
mmap(0x7ffb4e9a1000, 21864448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5e0000) = 0x7ffb4e9a1000
mmap(0x7ffb4fe7c000, 1171456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1aba000) = 0x7ffb4fe7c000
mmap(0x7ffb4ff9a000, 414272, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb4ff9a000
close(3)                                = 0
mprotect(0x7ffb4fe7c000, 98304, PROT_READ) = 0
sched_get_priority_max(SCHED_RR)        = 99
sched_get_priority_min(SCHED_RR)        = 1
munmap(0x7ffb6dc3d000, 19787)           = 0
brk(0x55db38f01000)                     = 0x55db38f01000
brk(0x55db38f22000)                     = 0x55db38f22000
brk(0x55db38f43000)                     = 0x55db38f43000
brk(0x55db38f64000)                     = 0x55db38f64000
brk(0x55db38f88000)                     = 0x55db38f88000
futex(0x7ffb646a410c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7ffb6da5b36c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x55db38fa9000)                     = 0x55db38fa9000
brk(0x55db38fca000)                     = 0x55db38fca000
brk(0x55db38feb000)                     = 0x55db38feb000
brk(0x55db3900c000)                     = 0x55db3900c000
futex(0x7ffb643fd77c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x55db3902d000)                     = 0x55db3902d000
brk(0x55db3904e000)                     = 0x55db3904e000
brk(0x55db3906f000)                     = 0x55db3906f000
brk(0x55db39090000)                     = 0x55db39090000
openat(AT_FDCWD, "/sys/devices/system/cpu0/topology/thread_siblings", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3
read(3, "0-255\n", 1024)                = 6
close(3)                                = 0
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x55db37344ef6} ---
+++ killed by SIGILL (core dumped) +++
Illegal instruction (core dumped)

ldd /app/server
	linux-vdso.so.1 (0x00007fffbff45000)
	libcublas.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.11 (0x00007fab17200000)
	libcudart.so.11.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007fab16e00000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fab16bd4000)
	libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007fab17119000)
	libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fab20484000)
	libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007fab169ac000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fab21001000)
	libcublasLt.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007fab02a00000)
	librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007fab2047d000)
	libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fab20478000)
	libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007fab20473000)

kunibald413 · 2024-02-20T18:36:05Z

cuda-supported docker image works like a charm and fairly quick. but then the 1 out of 10 machines you deploy to crashes with this 'illegal instruction' error.

the issue also is often reported across adaptations:

ollama/ollama#2187
https://github.com/search?q=repo%3Aoobabooga%2Ftext-generation-webui+illegal+instruction&type=issues

I'm not too familiar with these instructions, but is it not feasible to have one workflow that builds one docker image that you can deploy reliably? it just works so well and not having one docker image is a bit of a shame.

kunibald413 · 2024-02-20T19:07:26Z

I have a similar issue in docker on some machines. I'm using local/llama.cpp:full-cuda

After an strace, it turned out /server couldn't find libcublas.so.11.

However I have it in /usr/local/cuda-11.7/targets/x86_64-linux/lib/libcublas.so.11 Perhaps something wrong with the way I built, still investigating.

gdb

Starting program: /app/server 
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x0000559816d04ef6 in gpt_params::gpt_params() ()

strace

Expand strace output

ldd /app/server
	linux-vdso.so.1 (0x00007fffbff45000)
	libcublas.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.11 (0x00007fab17200000)
	libcudart.so.11.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007fab16e00000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fab16bd4000)
	libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007fab17119000)
	libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fab20484000)
	libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007fab169ac000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fab21001000)
	libcublasLt.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007fab02a00000)
	librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007fab2047d000)
	libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fab20478000)
	libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007fab20473000)

try
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local//usr/local/cuda-11.7/targets/x86_64-linux/lib
https://stackoverflow.com/questions/54249577/importerror-libcuda-so-1-cannot-open-shared-object-file

github-actions · 2024-04-12T01:07:28Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

gaby mentioned this issue Mar 26, 2023

Use llama precompiled source serge-chat/serge#66

Closed

gjmulder added bug Something isn't working hardware Hardware related labels Mar 26, 2023

jmtatsch mentioned this issue Apr 6, 2023

Running a Vicuna-13B 4it model ? #771

Closed

bsilverthorn mentioned this issue May 5, 2023

build Docker images with AVX2 only #1334

Closed

siddhsql mentioned this issue Jun 8, 2023

crash on macos with SIGABRT abetlen/llama-cpp-python#342

Closed

gaby mentioned this issue Aug 14, 2023

Precompiled wheels with CuBLAS activated abetlen/llama-cpp-python#243

Closed

pseudotensor mentioned this issue Mar 3, 2024

Fatal Python Error when running docker compose h2oai/h2ogpt#1440

Closed

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker Issus ''Illegal instruction'' #537

Docker Issus ''Illegal instruction'' #537

Netsuno commented Mar 26, 2023

nsarrazin commented Mar 26, 2023 •

edited

Loading

anzz1 commented Mar 26, 2023

slaren commented Mar 26, 2023

gaby commented Mar 26, 2023

Netsuno commented Mar 26, 2023

anzz1 commented Mar 26, 2023 •

edited

Loading

Netsuno commented Mar 27, 2023

gaby commented Mar 28, 2023

anzz1 commented Mar 29, 2023

Taillan commented Mar 30, 2023

Netsuno commented Mar 30, 2023

kiratp commented Apr 30, 2023 •

edited

Loading

JerryYao80 commented May 29, 2023 •

edited

Loading

jpodivin commented Jul 16, 2023

anzz1 commented Jul 16, 2023 •

edited

Loading

jpodivin commented Jul 16, 2023

anzz1 commented Jul 17, 2023

jpodivin commented Jul 17, 2023

anzz1 commented Jul 19, 2023

athrael-soju commented Aug 19, 2023 •

edited

Loading

lapp0 commented Nov 19, 2023 •

edited

Loading

kunibald413 commented Feb 20, 2024 •

edited

Loading

kunibald413 commented Feb 20, 2024

github-actions bot commented Apr 12, 2024

Docker Issus ''Illegal instruction'' #537

Docker Issus ''Illegal instruction'' #537

Comments

Netsuno commented Mar 26, 2023

nsarrazin commented Mar 26, 2023 • edited Loading

anzz1 commented Mar 26, 2023

slaren commented Mar 26, 2023

gaby commented Mar 26, 2023

Netsuno commented Mar 26, 2023

anzz1 commented Mar 26, 2023 • edited Loading

Netsuno commented Mar 27, 2023

gaby commented Mar 28, 2023

anzz1 commented Mar 29, 2023

Taillan commented Mar 30, 2023

Netsuno commented Mar 30, 2023

kiratp commented Apr 30, 2023 • edited Loading

JerryYao80 commented May 29, 2023 • edited Loading

jpodivin commented Jul 16, 2023

anzz1 commented Jul 16, 2023 • edited Loading

jpodivin commented Jul 16, 2023

anzz1 commented Jul 17, 2023

jpodivin commented Jul 17, 2023

anzz1 commented Jul 19, 2023

athrael-soju commented Aug 19, 2023 • edited Loading

lapp0 commented Nov 19, 2023 • edited Loading

kunibald413 commented Feb 20, 2024 • edited Loading

kunibald413 commented Feb 20, 2024

github-actions bot commented Apr 12, 2024

nsarrazin commented Mar 26, 2023 •

edited

Loading

anzz1 commented Mar 26, 2023 •

edited

Loading

kiratp commented Apr 30, 2023 •

edited

Loading

JerryYao80 commented May 29, 2023 •

edited

Loading

anzz1 commented Jul 16, 2023 •

edited

Loading

athrael-soju commented Aug 19, 2023 •

edited

Loading

lapp0 commented Nov 19, 2023 •

edited

Loading

kunibald413 commented Feb 20, 2024 •

edited

Loading