Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Issus ''Illegal instruction'' #537

Closed
Netsuno opened this issue Mar 26, 2023 · 24 comments
Closed

Docker Issus ''Illegal instruction'' #537

Netsuno opened this issue Mar 26, 2023 · 24 comments
Labels
bug Something isn't working hardware Hardware related stale

Comments

@Netsuno
Copy link

Netsuno commented Mar 26, 2023

I try to make it run the docker version on Unraid,

I run this as post Arguments:
--run -m /models/7B/ggml-model-q4_0.bin -p "This is a test" -n 512

I got this error: /app/.devops/tools.sh: line 40: 7 Illegal instruction ./main $arg2

Log:

main: seed = 1679843913
llama_model_load: loading model from '/models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml ctx size = 4273.34 MB
llama_model_load: mem required  = 6065.34 MB (+ 1026.00 MB per state)
/app/.devops/tools.sh: line 40:     7 Illegal instruction     ./main $arg2

I have run this whitout any issus: --all-in-one "/models/" 7B

@nsarrazin
Copy link

nsarrazin commented Mar 26, 2023

I also get Illegal instruction (core dumped) when using the docker image, while compiling from source seems to solve the issue.

This is on Pop Os 22.04 with kernel 6.2.0-76060200 on a Ryzen 5 5600X, x86_64 with avx2, with gcc11.

Some versions are fine though, we found that light-19726169b379bebc96189673a19b89ab1d307659 doesn't seem to have this problem but light-34c1072e497eb92d81ee7c0e12aa6741496a41c6 does ?

(we've been tracking this here too: serge-chat/serge#66)

@anzz1
Copy link
Contributor

anzz1 commented Mar 26, 2023

Illegal instruction sounds like using a instruction which your processor does not support. I've touched on the issue in this discussion:

@slaren
Copy link
Collaborator

slaren commented Mar 26, 2023

It's clear that as long as CPU features are determined at compile time, distributing binaries is going to cause problems like this.

@gaby
Copy link
Contributor

gaby commented Mar 26, 2023

@slaren That explains why the binaries are so inconsistent. We don't know what CPU's the github runners are using, thus making the binaries un-usable

@gjmulder gjmulder added bug Something isn't working hardware Hardware related labels Mar 26, 2023
@Netsuno
Copy link
Author

Netsuno commented Mar 26, 2023

Machine spec:
Motherbord: Supermicro X9DRi-LN4+, Version REV:1.20A
CPU: Dual Xeon E5-2670 v2
Ram: 128G DDR3 LRDIMM

@anzz1
Copy link
Contributor

anzz1 commented Mar 26, 2023

@gaby yes they can vary, but for compilation it doesn't matter which cpu the runner has, only for tests. as you can see in the discussion how the windows builds always build avx512 but only test when its possible. if the docker builder looks at its own features when compiling the binaries, then its misconfigured. if i compile something for gameboy advance on my x86 pc, its not the features of my pc what i should choose when compiling. i'm not too familiar with docker but i suppose there has to be an option too which would not precompile binaries but rather have sources inside the container which would be compiled as the first step of installation.

but idk, the whole raison d'être for docker containers is to deal with the huge mess of interconnected dependencies in the linux world which are hard to deal with. but this project doesn't contain any dependencies or libraries and can be simply built on any machine. so i don't understand the value proposition of docker when it comes to this project at all, except the negative value of constantly having to deal with issues related to it.

if you are a absolute fan of docker and you just absolutely positively have to have it, the container could literally have a single .sh bash script which would do

git clone https://github.com/ggerganov/llama.cpp.git
make

and that's it lol. the beauty of having no libraries and dependencies.

for precompiled binaries currently the only option is to build packages for different options like the windows releases. in the future a better option would be to detect the features at runtime though, unless it cant be done without a performance penalty but probably it can. it has to be researched a bit though because it would affect inlining which cannot be done when the codepaths arent static. if inlining achieves performance benefit then we gotta stick with the multi builds as speed > everything else.

@Netsuno
Copy link
Author

Netsuno commented Mar 27, 2023

but idk, the whole raison d'être for docker containers is to deal with the huge mess of interconnected dependencies in the linux world which are hard to deal with. but this project doesn't contain any dependencies or libraries and can be simply built on any machine. so i don't understand the value proposition of docker when it comes to this project at all, except the negative value of constantly having to deal with issues related to it.

Whit Unraid i have two way to run it, whit a VM or whit a Docker, whit docker i can share the resource whit other process and whit a VM i ''lock'' the resource. This is for me the plus value of a Docker.

@gaby
Copy link
Contributor

gaby commented Mar 28, 2023

@anzz1 Thanks for the insight. After several tries it seems that compiling llama.cpp as a first step during runtime is the solution.

@anzz1
Copy link
Contributor

anzz1 commented Mar 29, 2023

@anzz1 Thanks for the insight. After several tries it seems that compiling llama.cpp as a first step during runtime is the solution.

On a project with a million dependencies an libraries this might be a problem, but as there is no dependencies and builds on anything and thus compilation shouldn't pose a problem nor take more time than a few seconds. However in the post above there's an ongoing discussion about adding the ability of checking processor features at runtime. There is some work however in accurately testing and analyzing that it can be done without hurting performance, so it's currently on the backlog under more important issues.

When it's properly researched that it can be done without degrading performance, the part of adding it isn't hard at all. Just have to be sure to not introduce a regression while doing it.

@Taillan
Copy link

Taillan commented Mar 30, 2023

@Netsuno have u succeed ? I got UNRAID too but dont succeed to run it on Docker

@Netsuno
Copy link
Author

Netsuno commented Mar 30, 2023

@Taillan I have make my own docker image to run it. But my Unraid server is not powerfull (2x xeon 2670 v2) so i have stop my idea to make it work on Unraid for now (it take 100% of my power for 2 minute to generate 1 answer)

@kiratp
Copy link

kiratp commented Apr 30, 2023

On a project with a million dependencies an libraries this might be a problem, but as there is no dependencies and builds on anything and thus compilation shouldn't pose a problem nor take more time than a few seconds

A case for runtime detection:

In any reasonable, modern cloud deployment, llama.cpp would end up inside a container. In fact, being CPU-only, llama enables deploying your ML inference to something like AWS Lambda/GCP Cloud Run providing very simple, huge scalability for inference. All these systems use containerization and expect you to have pre-built binaries ready to go. Compiling at container launch is not really an option as that significantly increases cold-start/scale up latencies (a few seconds is too long).

However, the higher up the serverless stack you go, the less control you have over the CPU platform underneath. GCP, for example, has machines from Haswell era ++ all intermingled in and they don’t even document what to expect for Cloud Functions or Cloud Run.

I’m not a C expert by any means so not my wheelhouse to offer up a PR but the case for this is pretty strong IMO.

@JerryYao80
Copy link

JerryYao80 commented May 29, 2023

Got the same error:

ERROR: /app/.devops/tools.sh: line 40 6 Illegal instruction ./main $arg2

when I executed command:

docker run -v /models/llama7b:/home ghcr.io/ggerganov/llama.cpp:full --run -m /home/ggml-model-q4_1.bin -p "hello" -n 512

my environment is :

Docker Toolbox 1.13.1
docker client: 1.13.1 os/arch: windows 7 /amd64
docker server:19.03.12 os/arch:ubuntu 22.04 /amd64

Does anyone can help?

@jpodivin
Copy link
Contributor

Got the same error:

ERROR: /app/.devops/tools.sh: line 40 6 Illegal instruction ./main $arg2

when I executed command:

docker run -v /models/llama7b:/home ghcr.io/ggerganov/llama.cpp:full --run -m /home/ggml-model-q4_1.bin -p "hello" -n 512

my environment is :

Docker Toolbox 1.13.1
docker client: 1.13.1 os/arch: windows 7 /amd64
docker server:19.03.12 os/arch:ubuntu 22.04 /amd64

Does anyone can help?

I'm afraid that you'll have to rebuild the image locally and use that instead. But that isn't very complicated.

@anzz1
Copy link
Contributor

anzz1 commented Jul 16, 2023

On a project with a million dependencies an libraries this might be a problem, but as there is no dependencies and builds on anything and thus compilation shouldn't pose a problem nor take more time than a few seconds

A case for runtime detection:

In any reasonable, modern cloud deployment, llama.cpp would end up inside a container. In fact, being CPU-only, llama enables deploying your ML inference to something like AWS Lambda/GCP Cloud Run providing very simple, huge scalability for inference. All these systems use containerization and expect you to have pre-built binaries ready to go. Compiling at container launch is not really an option as that significantly increases cold-start/scale up latencies (a few seconds is too long).

However, the higher up the serverless stack you go, the less control you have over the CPU platform underneath. GCP, for example, has machines from Haswell era ++ all intermingled in and they don’t even document what to expect for Cloud Functions or Cloud Run.

I’m not a C expert by any means so not my wheelhouse to offer up a PR but the case for this is pretty strong IMO.

Yeah, the modern cloud environment where in many cases you have less control over and knowledge about the underlying hardware than what used to be is unfortunate, but it is reality.

You definitely do not want to go all-out runtime detection in a performance-driven application like this and lose the compiler optimizations allowed by compiler-time detection with simple #ifdef's, hurting everyone else in the process for the sake of cloud and containerization but there is a case for having it both ways.

Something like this:

inline unsigned int ggml_cpu_has_avx512(void) {
#if defined(CPUDETECT_RUNTIME)
  static unsigned const char a[] = {0x53,0x31,0xC9,0xB8,0x07,0x00,0x00,0x00,0x0F,0xA2,0xC1,0xEB,0x10,0x83,0xE3,0x01,0x89,0xD8,0x5B,0xC3};
  return ((unsigned int (__cdecl *)(void)) (void*)((void*)a))();
#elif defined(__AVX512F__)
    return 1;
#else
    return 0;
#endif
}

Then replacing any #ifdef __AVX512F__ with ggml_cpu_has_avx512() allows for runtime detection when configured as such and when not, the compiler should optimize it away and have the same end result as #ifdef that is not messing up its' optimization logic. However compilers can be finicky sometimes so it's definitely prudent to check with a disassembler that the end result really is the same.

edit: To be clear, using bytecode in the example above isn't being obtuse for the sake of being obtuse in some misguided attempt of trying to look smart or something. Optimally you'd use __asm { }, but the reason why you can't is that contrary to every other compiler out there, MSVC decided to drop inline assembly support for the 64-bit era, a decision made to be a bane of low-level coders existences' ever since. Bytecode is the only thing that works for every compiler. If you want to see what's going on, you can copy paste the bytecode above to https://disasm.czbix.com/ for example.

Here's a list of some of the (x86) processor feature checks in bytecode: cpuid.h

Rest can be found in the x86 documentation:

Intel ® Architecture Instruction Set Extensions and Future Features "Chapter 1.5 CPUID Instruction"

AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions "Appendix D: Instruction Subsets and CPUID Feature Flags"

@jpodivin
Copy link
Contributor

Easiest solution, imho, is to provide multiple versions of the container image. It doesn't have to cover all architectures, and it doesn't have to be every release. But setup which covers >=85% of consumers at any given time is enough.

The rest can rebuild.

@anzz1
Copy link
Contributor

anzz1 commented Jul 17, 2023

Easiest solution, imho, is to provide multiple versions of the container image. It doesn't have to cover all architectures, and it doesn't have to be every release. But setup which covers >=85% of consumers at any given time is enough.

The rest can rebuild.

Sure, the easiest solution would be to create a container image for every configuration set, and it could be easily automated with github actions. It's a solution, but not a good one, as it's not future proof and carries a risk of getting stuck with a bad practice.
You know what they say, Nothing is more permanent than a temporary solution 😄

@jpodivin
Copy link
Contributor

I didn't say every configuration set. Unless the runtime detection has only negligible impact on performance I think it's better for consumer to just get image optimized for their architecture. Obviously there is a point of diminishing returns. But even Intel provides optimized images for avx512 [0].

[0] https://hub.docker.com/r/intel/intel-optimized-tensorflow-avx512

@anzz1
Copy link
Contributor

anzz1 commented Jul 19, 2023

I didn't say every configuration set. Unless the runtime detection has only negligible impact on performance I think it's better for consumer to just get image optimized for their architecture. Obviously there is a point of diminishing returns. But even Intel provides optimized images for avx512 [0].

[0] hub.docker.com/r/intel/intel-optimized-tensorflow-avx512

Sure, you could do automatic separate images for AVX / AVX2 / AVX512 like the Windows releases by just editing the action, no code change necessary.

Or as the binaries are rather small, you could pack each of them in one image and add a simple script to have "launch-time" detection if you will, something like this:

#!/bin/sh

cpuinfo="$(cat /proc/cpuinfo)"
if [ $(echo "$cpuinfo" | grep -c avx512) -gt 0 ]; then
	./llama_avx512 "$@"
elif [ $(echo "$cpuinfo" | grep -c avx2) -gt 0 ]; then
	./llama_avx2 "$@"
else
	./llama_avx "$@"
fi

@athrael-soju
Copy link

athrael-soju commented Aug 19, 2023

If your model is already quantized, this did the trick for me, using light:

docker run -v /E/Projects/llama.cpp/models:/models ghcr.io/ggerganov/llama.cpp:light -m models/7B/llama-2-7b-chat.ggmlv3.q4_0.bin -p "hello" -n 512

@lapp0
Copy link

lapp0 commented Nov 19, 2023

I have a similar issue in docker on some machines. I'm using local/llama.cpp:full-cuda

After an strace, it turned out /server couldn't find libcublas.so.11.

However I have it in /usr/local/cuda-11.7/targets/x86_64-linux/lib/libcublas.so.11 Perhaps something wrong with the way I built, still investigating.

gdb

Starting program: /app/server 
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x0000559816d04ef6 in gpt_params::gpt_params() ()

strace

Expand strace output
execve("/app/server", ["/app/server"], 0x7ffc7919b310 /* 59 vars */) = 0
brk(NULL)                               = 0x55db38d96000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffe3a3cabe0) = -1 EINVAL (Invalid argument)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6dc42000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/glibc-hwcaps/x86-64-v3/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/glibc-hwcaps/x86-64-v3", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/glibc-hwcaps/x86-64-v2/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/glibc-hwcaps/x86-64-v2", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/tls/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/tls/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/tls", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/glibc-hwcaps/x86-64-v3/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/glibc-hwcaps/x86-64-v3", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/glibc-hwcaps/x86-64-v2/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/glibc-hwcaps/x86-64-v2", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/tls/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/tls", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64/x86_64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/nvidia/lib64/libcublas.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/nvidia/lib64", 0x7ffe3a3c9e00, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=19787, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 19787, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ffb6dc3d000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.11", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\300\v\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=151346592, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 155573560, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb647df000
mmap(0x7ffb64800000, 153476408, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x7ffb64800000
munmap(0x7ffb647df000, 135168)          = 0
munmap(0x7ffb6da5e000, 1961272)         = 0
mprotect(0x7ffb6d80e000, 2097152, PROT_NONE) = 0
mmap(0x7ffb6da0e000, 294912, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x900e000) = 0x7ffb6da0e000
mmap(0x7ffb6da56000, 32056, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb6da56000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200\360\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=671072, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 4869864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6435b000
mmap(0x7ffb64400000, 2772712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x7ffb64400000
munmap(0x7ffb6435b000, 675840)          = 0
munmap(0x7ffb646a5000, 1421032)         = 0
mprotect(0x7ffb6449e000, 2097152, PROT_NONE) = 0
mmap(0x7ffb6469e000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9e000) = 0x7ffb6469e000
mmap(0x7ffb646a4000, 3816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb646a4000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=2260296, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 2275520, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb641d4000
mprotect(0x7ffb6426e000, 1576960, PROT_NONE) = 0
mmap(0x7ffb6426e000, 1118208, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9a000) = 0x7ffb6426e000
mmap(0x7ffb6437f000, 454656, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ab000) = 0x7ffb6437f000
mmap(0x7ffb643ef000, 57344, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21a000) = 0x7ffb643ef000
mmap(0x7ffb643fd000, 10432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb643fd000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=940560, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 942344, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db56000
mmap(0x7ffb6db64000, 507904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe000) = 0x7ffb6db64000
mmap(0x7ffb6dbe0000, 372736, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8a000) = 0x7ffb6dbe0000
mmap(0x7ffb6dc3b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe4000) = 0x7ffb6dc3b000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=125488, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 127720, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db36000
mmap(0x7ffb6db39000, 94208, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7ffb6db39000
mmap(0x7ffb6db50000, 16384, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a000) = 0x7ffb6db50000
mmap(0x7ffb6db54000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d000) = 0x7ffb6db54000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\237\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3, "\4\0\0\0 \0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0"..., 48, 848) = 48
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\244;\374\204(\337f#\315I\214\234\f\256\271\32"..., 68, 896) = 68
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=2216304, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6db34000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
mmap(NULL, 2260560, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb63fac000
mmap(0x7ffb63fd4000, 1658880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7ffb63fd4000
mmap(0x7ffb64169000, 360448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7ffb64169000
mmap(0x7ffb641c1000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x214000) = 0x7ffb641c1000
mmap(0x7ffb641c7000, 52816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb641c7000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/glibc-hwcaps/x86-64-v3/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/glibc-hwcaps/x86-64-v3", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/glibc-hwcaps/x86-64-v2/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/glibc-hwcaps/x86-64-v2", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/tls", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/x86_64", 0x7ffe3a3c9d20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20 ,\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=332762424, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 337251104, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb4fe0b000
mmap(0x7ffb50000000, 335153952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x7ffb50000000
munmap(0x7ffb4fe0b000, 2052096)         = 0
munmap(0x7ffb63fa1000, 43808)           = 0
mprotect(0x7ffb6144b000, 2097152, PROT_NONE) = 0
mmap(0x7ffb6164b000, 43048960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1144b000) = 0x7ffb6164b000
mmap(0x7ffb63f59000, 293664, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb63f59000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/librt.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=14664, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 16440, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db2f000
mmap(0x7ffb6db30000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7ffb6db30000
mmap(0x7ffb6db31000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db31000
mmap(0x7ffb6db32000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db32000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=21448, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 16424, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db2a000
mmap(0x7ffb6db2b000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7ffb6db2b000
mmap(0x7ffb6db2c000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db2c000
mmap(0x7ffb6db2d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db2d000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=14432, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 16424, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb6db25000
mmap(0x7ffb6db26000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7ffb6db26000
mmap(0x7ffb6db27000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db27000
mmap(0x7ffb6db28000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7ffb6db28000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6db23000
mmap(NULL, 40960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6db19000
arch_prctl(ARCH_SET_FS, 0x7ffb6db20000) = 0
set_tid_address(0x7ffb6db202d0)         = 2375
set_robust_list(0x7ffb6db202e0, 24)     = 0
rseq(0x7ffb6db209a0, 0x20, 0, 0x53053053) = 0
mprotect(0x7ffb641c1000, 16384, PROT_READ) = 0
mprotect(0x7ffb6db28000, 4096, PROT_READ) = 0
mprotect(0x7ffb6db2d000, 4096, PROT_READ) = 0
mprotect(0x7ffb6db32000, 4096, PROT_READ) = 0
mprotect(0x7ffb6dc3b000, 4096, PROT_READ) = 0
mprotect(0x7ffb6db54000, 4096, PROT_READ) = 0
mprotect(0x7ffb6164b000, 1626112, PROT_READ) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffb6db17000
mprotect(0x7ffb643ef000, 45056, PROT_READ) = 0
mprotect(0x7ffb6469e000, 20480, PROT_READ) = 0
mprotect(0x7ffb6da0e000, 57344, PROT_READ) = 0
mprotect(0x55db37bed000, 12288, PROT_READ) = 0
mprotect(0x7ffb6dc7c000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x7ffb6dc3d000, 19787)           = 0
getrandom("\x09\x04\xe0\x7c\x1d\xb7\x5a\x62", 8, GRND_NONBLOCK) = 8
brk(NULL)                               = 0x55db38d96000
brk(0x55db38db7000)                     = 0x55db38db7000
futex(0x7ffb63f9d00c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x55db38dd8000)                     = 0x55db38dd8000
brk(0x55db38df9000)                     = 0x55db38df9000
brk(0x55db38e1a000)                     = 0x55db38e1a000
futex(0x7ffb63f9de44, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7ffb63f9de50, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x55db38e3b000)                     = 0x55db38e3b000
brk(0x55db38e5c000)                     = 0x55db38e5c000
brk(0x55db38e7d000)                     = 0x55db38e7d000
brk(0x55db38e9e000)                     = 0x55db38e9e000
brk(0x55db38ebf000)                     = 0x55db38ebf000
brk(0x55db38ee0000)                     = 0x55db38ee0000
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libcuda.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=19787, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 19787, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ffb6dc3d000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libcuda.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\r\16\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=29196368, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 29614656, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffb4e3c1000
mprotect(0x7ffb4e4a1000, 27111424, PROT_NONE) = 0
mmap(0x7ffb4e4a1000, 5242880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe0000) = 0x7ffb4e4a1000
mmap(0x7ffb4e9a1000, 21864448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5e0000) = 0x7ffb4e9a1000
mmap(0x7ffb4fe7c000, 1171456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1aba000) = 0x7ffb4fe7c000
mmap(0x7ffb4ff9a000, 414272, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffb4ff9a000
close(3)                                = 0
mprotect(0x7ffb4fe7c000, 98304, PROT_READ) = 0
sched_get_priority_max(SCHED_RR)        = 99
sched_get_priority_min(SCHED_RR)        = 1
munmap(0x7ffb6dc3d000, 19787)           = 0
brk(0x55db38f01000)                     = 0x55db38f01000
brk(0x55db38f22000)                     = 0x55db38f22000
brk(0x55db38f43000)                     = 0x55db38f43000
brk(0x55db38f64000)                     = 0x55db38f64000
brk(0x55db38f88000)                     = 0x55db38f88000
futex(0x7ffb646a410c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7ffb6da5b36c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x55db38fa9000)                     = 0x55db38fa9000
brk(0x55db38fca000)                     = 0x55db38fca000
brk(0x55db38feb000)                     = 0x55db38feb000
brk(0x55db3900c000)                     = 0x55db3900c000
futex(0x7ffb643fd77c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x55db3902d000)                     = 0x55db3902d000
brk(0x55db3904e000)                     = 0x55db3904e000
brk(0x55db3906f000)                     = 0x55db3906f000
brk(0x55db39090000)                     = 0x55db39090000
openat(AT_FDCWD, "/sys/devices/system/cpu0/topology/thread_siblings", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3
read(3, "0-255\n", 1024)                = 6
close(3)                                = 0
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x55db37344ef6} ---
+++ killed by SIGILL (core dumped) +++
Illegal instruction (core dumped)
ldd /app/server
	linux-vdso.so.1 (0x00007fffbff45000)
	libcublas.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.11 (0x00007fab17200000)
	libcudart.so.11.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007fab16e00000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fab16bd4000)
	libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007fab17119000)
	libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fab20484000)
	libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007fab169ac000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fab21001000)
	libcublasLt.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007fab02a00000)
	librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007fab2047d000)
	libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fab20478000)
	libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007fab20473000)

@kunibald413
Copy link

kunibald413 commented Feb 20, 2024

cuda-supported docker image works like a charm and fairly quick. but then the 1 out of 10 machines you deploy to crashes with this 'illegal instruction' error.

the issue also is often reported across adaptations:

ollama/ollama#2187
https://github.com/search?q=repo%3Aoobabooga%2Ftext-generation-webui+illegal+instruction&type=issues

I'm not too familiar with these instructions, but is it not feasible to have one workflow that builds one docker image that you can deploy reliably? it just works so well and not having one docker image is a bit of a shame.

@kunibald413
Copy link

I have a similar issue in docker on some machines. I'm using local/llama.cpp:full-cuda

After an strace, it turned out /server couldn't find libcublas.so.11.

However I have it in /usr/local/cuda-11.7/targets/x86_64-linux/lib/libcublas.so.11 Perhaps something wrong with the way I built, still investigating.

gdb

Starting program: /app/server 
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x0000559816d04ef6 in gpt_params::gpt_params() ()

strace

Expand strace output

ldd /app/server
	linux-vdso.so.1 (0x00007fffbff45000)
	libcublas.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.11 (0x00007fab17200000)
	libcudart.so.11.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007fab16e00000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fab16bd4000)
	libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007fab17119000)
	libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fab20484000)
	libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007fab169ac000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fab21001000)
	libcublasLt.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007fab02a00000)
	librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007fab2047d000)
	libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fab20478000)
	libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007fab20473000)

try
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local//usr/local/cuda-11.7/targets/x86_64-linux/lib
https://stackoverflow.com/questions/54249577/importerror-libcuda-so-1-cannot-open-shared-object-file

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hardware Hardware related stale
Projects
None yet
Development

No branches or pull requests