Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission problems? #4

Open
Nachtwind85 opened this issue Apr 25, 2023 · 11 comments
Open

Permission problems? #4

Nachtwind85 opened this issue Apr 25, 2023 · 11 comments

Comments

@Nachtwind85
Copy link

``Hi,

After trying to follow the steps to use your docker i have, so far, havent found a solution on how to use rocmiinfo (or anything that accesses rocm anyway) through the docker.

Currently i try the following:
podman run -it --device=/dev/kfd --device=/dev/dri --net=host --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $HOME/sddocker:/sddocker localhost/rocm-pytorch-gfx803

And then trying this:
(environ) sduser@HAL:~$ rocminfo ROCk module is loaded Unable to open /dev/kfd read-write: Permission denied root is not member of "nogroup" group, the default DRM access group. Users must be a member of the "nogroup" group or another DRM access group in order for ROCm applications to run successfully.

Which kind of... surprises me that is raises "root" rather than sduser.

Any ideas how to solve this?

@Firstbober
Copy link
Owner

Firstbober commented Apr 25, 2023

The podman container never will have more permissions than the user that is running it, so I guess that your user outside the container isn't in video or render group referring to ROCm docs. If it is, then I have no idea really. Also, send your system info, so I know what we are working with.

@Nachtwind85
Copy link
Author

Nachtwind85 commented Apr 25, 2023

Hi, thanks for your answer..
First of all my user outside of the docker:

martin@HAL:~$ groups martin adm cdrom sudo dip video plugdev render lpadmin lxd sambashare docker

My System is a Ubuntu 22.04:
martin@HAL:~$ uname -a Linux HAL 5.19.0-40-generic #41~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 31 16:00:14 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Rocminfo on the machine outside docker:

`
martin@HAL:~$ rocminfo
ROCk module is loaded

HSA System Attributes

Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE

==========
HSA Agents


Agent 1


Name: AMD Ryzen 5 2600 Six-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 2600 Six-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3400
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32792656(0x1f46050) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32792656(0x1f46050) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32792656(0x1f46050) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 2


Name: gfx803
Uuid: GPU-XX
Marketing Name: Radeon RX 580 Series
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
Chip ID: 26591(0x67df)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1366
BDFID: 1792
Internal Node ID: 1
Compute Unit: 36
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8388608(0x800000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx803
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
`

and now for the devices:

martin@HAL:~$ ls /dev/ -l | grep kfd crw-rw---- 1 root render 511, 0 Apr 24 22:18 kfd martin@HAL:~$ ls /dev/dri/ -l drwxr-xr-x 2 root root 80 Apr 25 15:07 by-path crw-rw----+ 1 root video 226, 0 Apr 25 15:07 card0 crw-rw----+ 1 root render 226, 128 Apr 24 22:18 renderD128

Maybe this helps a bit?

@Firstbober
Copy link
Owner

Try adding --group-add nogroup to run parameters, maybe it will help.

@Nachtwind85
Copy link
Author

` martin@HAL:~/Projekte/rocm-pytorch-gfx803-docker$ podman run -it --device=/dev/kfd --device=/dev/dri --net=host --group-add=video --group-add=nogroup --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $HOME/sddocker:/sddocker localhost/rocm-pytorch-gfx803

(environ) sduser@HAL:~$ rocminfo
ROCk module is loaded
Unable to open /dev/kfd read-write: Permission denied
root is not member of "nogroup" group, the default DRM access group. Users must be a member of the "nogroup" group or another DRM access group in order for ROCm applications to run successfully.
`

That didnt help ;)

@Firstbober
Copy link
Owner

💀

@takov751
Copy link

takov751 commented May 5, 2023

So to solve this issue, i have made an updated rootless version. Still bloated, but working. And as a demo it runs the stable diffusion webui.

#5

PS.: the issue above is the podman relative uid gid, which differs in the container. So that needs to be mapped first. It's a pain in the bottom for sure, hence came the rootless idea to get around this issue

@Firstbober
Copy link
Owner

Oh, I always used this container in the rootless mode, as my podman is installed that way. Making it explicitly rootless is a good idea.

@takov751
Copy link

takov751 commented May 9, 2023

Indeed, i was thinking about creating two subversion docker and rootless inside the Dockerfile.

PS.: While i was testing this i was able to create 6 images successfully, however since then i am struggling with hardware failure, which causing opencl/rocm kernel error even if its just a clinfo/rocminfo. that part or my rx580 died. Now i had to order gpu

@Firstbober
Copy link
Owner

That's sad to hear, probably the warranty is expired too. Well, I will probably merge your PR in a few hours to days max. Btw. you will order new RX 580 or more recent hardware?

@takov751
Copy link

takov751 commented May 9, 2023

My new saphire rx6600 8GB just got delivered( sadly i am still working). $220 on stock clearance before the new 7600 get in stock.

Rx6600 codename is gfx1032 and using with rocm is ok only possible if i force gfx1030/1031 to be recognised for pytorch.

@Firstbober
Copy link
Owner

Nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants