Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Running hardware accelerated app in nvidia jetson nano docker container #442

Closed
ruisebastiao opened this issue Oct 15, 2020 · 6 comments

Comments

@ruisebastiao
Copy link

ruisebastiao commented Oct 15, 2020

Hello,
this issue is not really related to meta-tegra, but i'm hopping to get some help here.

Following the instructions at https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-Container-Runtime-on-Jetson i am doing some tests for running GUI apps in a docker container in a jetson nano, i'm using the following code to do the tests:

    docker run --runtime nvidia --network host -it -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.4.3
     
    apt-get update && apt-get install -y mesa-utils
   
    export DISPLAY=:0.0 && glxgears

everything works well, in this example i run the container in host mode (--network host) but i have an application where i want to run the container isolated from the host, after i remove --network host and run the container, doing the same steps i got the following error:

root@056440576e00:/# glxgears 
Segmentation fault (core dumped)

the strace log:

socket(AF_UNIX, SOCK_DGRAM, 0)          = 7
  connect(7, {sa_family=AF_UNIX, sun_path=@"nvidia20ac498a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 66) = -1 ECONNREFUSED (Connection refused)
  close(7)                                = 0
  --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x28c48} ---
  +++ killed by SIGSEGV (core dumped) +++
  Segmentation fault
  root@056440576e00:/# 

gdb log:

(gdb) r
 Starting program: /usr/bin/glxgears 
 [Thread debugging using libthread_db enabled]
 Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
 
 Program received signal SIGSEGV, Segmentation fault.
 0x0000007fb78736a4 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0
 (gdb) 

why i'm getting the segmentation fault only running in the isolated mode. Maybe i have to mount other things besides /tmp/.X11-unix/:/tmp/.X11-unix

i'm using the dunfell-l4t-r32.4.3 branch

@dwalkes
Copy link
Member

dwalkes commented Oct 15, 2020

@ruisebastiao interesting test, unfortunately I don't really have any insight but I'm curious about the solution.

Do you see anything in the backtrace bt from gdb after the segfault to see where the entry point/call stack is for the libGLX_nvidia library?

Can you reproduce this on stock nvidia L4T JP 4.4? If so I'd cross post on the nvidia forum too.

@ruisebastiao
Copy link
Author

@dwalkes here is the gdb bt:

(gdb) bt
#0  0x0000007fb78736a4 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0
#1  0x0000007fb78a0d10 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0
#2  0x0000007fb7b497e4 in glXCreateContext () from /usr/lib/aarch64-linux-gnu/libGLX.so.0
#3  0x0000005555558230 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

@lfdmn
Copy link
Contributor

lfdmn commented Oct 15, 2020 via email

@ruisebastiao
Copy link
Author

My 2 cents. Do you find anything related to nvidia20ac498a in your roots?

Yes that was also one of my first tries, i searched in the container and in the host and i didn't find any file with that name, i search for nvidia* because the other part changes in the following container executions

@ruisebastiao
Copy link
Author

ruisebastiao commented Oct 16, 2020

@dwalkes

Can you reproduce this on stock nvidia L4T JP 4.4? If so I'd cross post on the nvidia forum too.

Just confirmed that in nvidia L4T JP 4.4 the result is the same.

gdb output:

(gdb) bt
#0  0x0000007fb78736a4 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0
#1  0x0000007fb78a0d10 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0
#2  0x0000007fb7b497e4 in glXCreateContext () from /usr/lib/aarch64-linux-gnu/libGLX.so.0

@dwalkes
Copy link
Member

dwalkes commented Oct 16, 2020

Just confirmed that in nvidia L4T JP 4.4 the result is the same.

@ruisebastiao good or bad news depending on how you look at it... good that there isn't something different about meta-tegra, bad that it's less likely some Matt Madison Magic will be able to solve it :)

I'd definitely cross post it on NVIDIA developer forum, at least to make sure they are aware of it.

@OE4T OE4T locked and limited conversation to collaborators Jun 10, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants