[Question] Running hardware accelerated app in nvidia jetson nano docker container #442

ruisebastiao · 2020-10-15T09:19:16Z

Hello,
this issue is not really related to meta-tegra, but i'm hopping to get some help here.

Following the instructions at https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-Container-Runtime-on-Jetson i am doing some tests for running GUI apps in a docker container in a jetson nano, i'm using the following code to do the tests:

    docker run --runtime nvidia --network host -it -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.4.3
     
    apt-get update && apt-get install -y mesa-utils
   
    export DISPLAY=:0.0 && glxgears

everything works well, in this example i run the container in host mode (--network host) but i have an application where i want to run the container isolated from the host, after i remove --network host and run the container, doing the same steps i got the following error:

root@056440576e00:/# glxgears 
Segmentation fault (core dumped)

the strace log:

socket(AF_UNIX, SOCK_DGRAM, 0)          = 7
  connect(7, {sa_family=AF_UNIX, sun_path=@"nvidia20ac498a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 66) = -1 ECONNREFUSED (Connection refused)
  close(7)                                = 0
  --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x28c48} ---
  +++ killed by SIGSEGV (core dumped) +++
  Segmentation fault
  root@056440576e00:/#

gdb log:

(gdb) r
 Starting program: /usr/bin/glxgears 
 [Thread debugging using libthread_db enabled]
 Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
 
 Program received signal SIGSEGV, Segmentation fault.
 0x0000007fb78736a4 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0
 (gdb)

why i'm getting the segmentation fault only running in the isolated mode. Maybe i have to mount other things besides /tmp/.X11-unix/:/tmp/.X11-unix

i'm using the dunfell-l4t-r32.4.3 branch

The text was updated successfully, but these errors were encountered:

dwalkes · 2020-10-15T13:47:25Z

@ruisebastiao interesting test, unfortunately I don't really have any insight but I'm curious about the solution.

Do you see anything in the backtrace bt from gdb after the segfault to see where the entry point/call stack is for the libGLX_nvidia library?

Can you reproduce this on stock nvidia L4T JP 4.4? If so I'd cross post on the nvidia forum too.

ruisebastiao · 2020-10-15T14:14:37Z

@dwalkes here is the gdb bt:

(gdb) bt
#0  0x0000007fb78736a4 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0
#1  0x0000007fb78a0d10 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0
#2  0x0000007fb7b497e4 in glXCreateContext () from /usr/lib/aarch64-linux-gnu/libGLX.so.0
#3  0x0000005555558230 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

lfdmn · 2020-10-15T18:57:28Z

My 2 cents. Do you find anything related to nvidia20ac498a in your roots? It looks to me that the code is trying to access a Unix socket with that name, it fails and doesn't get some initialization data which causes the crash.

…

On Thu, Oct 15, 2020, 17:14 Rui Sebastiao ***@***.***> wrote: @dwalkes <https://github.com/dwalkes> here is the gdb bt: (gdb) bt #0 0x0000007fb78736a4 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0 #1 0x0000007fb78a0d10 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0 #2 0x0000007fb7b497e4 in glXCreateContext () from /usr/lib/aarch64-linux-gnu/libGLX.so.0 #3 0x0000005555558230 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#442 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEDDM24MELUWVZIMBTWAD43SK37WDANCNFSM4SRW6LBQ> .

ruisebastiao · 2020-10-15T21:25:26Z

My 2 cents. Do you find anything related to nvidia20ac498a in your roots?

Yes that was also one of my first tries, i searched in the container and in the host and i didn't find any file with that name, i search for nvidia* because the other part changes in the following container executions

ruisebastiao · 2020-10-16T09:58:41Z

@dwalkes

Can you reproduce this on stock nvidia L4T JP 4.4? If so I'd cross post on the nvidia forum too.

Just confirmed that in nvidia L4T JP 4.4 the result is the same.

gdb output:

(gdb) bt
#0  0x0000007fb78736a4 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0
#1  0x0000007fb78a0d10 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0
#2  0x0000007fb7b497e4 in glXCreateContext () from /usr/lib/aarch64-linux-gnu/libGLX.so.0

dwalkes · 2020-10-16T13:55:51Z

Just confirmed that in nvidia L4T JP 4.4 the result is the same.

@ruisebastiao good or bad news depending on how you look at it... good that there isn't something different about meta-tegra, bad that it's less likely some Matt Madison Magic will be able to solve it :)

I'd definitely cross post it on NVIDIA developer forum, at least to make sure they are aware of it.

madisongh closed this as completed Jun 10, 2021

OE4T locked and limited conversation to collaborators Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[Question] Running hardware accelerated app in nvidia jetson nano docker container #442

[Question] Running hardware accelerated app in nvidia jetson nano docker container #442

ruisebastiao commented Oct 15, 2020 •

edited

Loading

dwalkes commented Oct 15, 2020

ruisebastiao commented Oct 15, 2020

lfdmn commented Oct 15, 2020 via email

ruisebastiao commented Oct 15, 2020

ruisebastiao commented Oct 16, 2020 •

edited

Loading

dwalkes commented Oct 16, 2020

This issue was moved to a discussion.

This issue was moved to a discussion.

[Question] Running hardware accelerated app in nvidia jetson nano docker container #442

[Question] Running hardware accelerated app in nvidia jetson nano docker container #442

Comments

ruisebastiao commented Oct 15, 2020 • edited Loading

dwalkes commented Oct 15, 2020

ruisebastiao commented Oct 15, 2020

lfdmn commented Oct 15, 2020 via email

ruisebastiao commented Oct 15, 2020

ruisebastiao commented Oct 16, 2020 • edited Loading

dwalkes commented Oct 16, 2020

This issue was moved to a discussion.

ruisebastiao commented Oct 15, 2020 •

edited

Loading

ruisebastiao commented Oct 16, 2020 •

edited

Loading