Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with ubuntu 22.04 build #588

Closed
fcharras opened this issue Dec 6, 2022 · 9 comments
Closed

Issues with ubuntu 22.04 build #588

fcharras opened this issue Dec 6, 2022 · 9 comments
Labels
distro Distribution specific questions

Comments

@fcharras
Copy link

fcharras commented Dec 6, 2022

I'm trying to run SYCL-based software on gpu (using dpctl ) and it requires the compute runtime to be installed to detect gpu devices.

The issue I have is that using ubuntu build the gpu devices are not detected. The same version downloaded from github works. But for end users it's much easier to use the build available in the official repo.

What differences in those two builds could explain this ? where would be the best place to report this to ?

@JablonskiMateusz
Copy link
Contributor

Hi @fcharras
Please share more details about GPU you are using.

@fcharras
Copy link
Author

$ lspci -vnn | grep VGA -A 12
00:02.0 VGA compatible controller [0300]: Intel Corporation TigerLake-LP GT2 [Iris Xe Graphics] [8086:9a49] (rev 01) (prog-if 00 [VGA controller])
	DeviceName: Onboard IGD
	Subsystem: Hewlett-Packard Company Device [103c:8720]
	Flags: bus master, fast devsel, latency 0, IRQ 173, IOMMU group 1
	Memory at 603e000000 (64-bit, non-prefetchable) [size=16M]
	Memory at 4000000000 (64-bit, prefetchable) [size=256M]
	I/O ports at 3000 [size=64]
	Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: i915
	Kernel modules: i915

@JablonskiMateusz
Copy link
Contributor

The only difference that I assume may change anything is additional cmake flag used for builds on our github:

packages were built with custom flag NEO_ENABLE_i915_PRELIM_DETECTION=1

but it shouldn't be the case for TGL.

Could you also info about kernel version and log from strace when using driver from ubuntu repository?

@fcharras
Copy link
Author

TY for looking at the issue, will provide the strace ASAP.

One thing I forgot to mention that could be useful for reproducibility is that I use a docker container from the latest, official ubuntu jammy image, created with the flag --device=/dev/dri for gpu passthrough. I didn't test directly on a ubuntu host. (my distro is an up to date arch linux-based endeavor os).

I've also provided quick reproduction steps in IntelPython/dpctl#1010 and a more thorough installation guide from .deb packages provided here in this README.

@fcharras
Copy link
Author

fcharras commented Dec 19, 2022

Logs from strace (not familiar with strace, I used ubuntu guide):

Kernel version:

(my-dpex-env) root@cdd9d852219e:~# uname -srm
Linux 6.0.12-arch1-1 x86_64

@JablonskiMateusz
Copy link
Contributor

I see there are different dependencies loaded in both cases:
[pid 5110] 14:12:38.847711 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libigdgmm.so.12", O_RDONLY|O_CLOEXEC) = 6 for ubuntu runtime
[pid 5186] 14:14:18.697559 openat(AT_FDCWD, "/usr/local/lib/libigdgmm.so.12", O_RDONLY|O_CLOEXEC) = 6 for github runtime

please verify your workspace where are they coming from

@JablonskiMateusz JablonskiMateusz added the distro Distribution specific questions label Dec 21, 2022
@fcharras
Copy link
Author

fcharras commented Jan 7, 2023

Can you reproduce the issue on your side ? given I run the instructions from a bare ubuntu image it should be easily reproducible.

From dpkg:

  • ubuntu runtime:
root@a3265eaeac03:/# dpkg -S libigdgmm 
libigdgmm12:amd64: /usr/lib/x86_64-linux-gnu/libigdgmm.so.12
libigdgmm12:amd64: /usr/lib/x86_64-linux-gnu/libigdgmm.so.12.1.0

it's the official build

  • github runtime:
root@c65ca8d5e738:/# dpkg -S libigdgmm 
intel-gmmlib: /usr/local/lib/libigdgmm.so.12
intel-gmmlib: /usr/local/lib/libigdgmm.so.12.0.1423

it comes from the set of deb packages that is provided in the install instructions in this repository.

Note that the versions distributed by ubuntu and on github that I use here are the same (22.14.22890) so that the differences can only be in the build or install process.

But I don't think the issues comes from the dependencies. IntelPython/dpctl#1010 which provides the quickest path to reproduce the device detection issue (with a hack) shows that the root cause comes from the package intel-opencl-icd:

  • ubuntu packages fail at detecting the device

  • uninstalling only intel-opencl-icd and replacing only this package with the corresponding github build enables device detection (although mixing dependencies from ubuntu and github in such a way results in a broken dependency tree and a broken runtime overall in some other aspects)

@JablonskiMateusz
Copy link
Contributor

There is an issue with compiler front-end library. From what I see package from github requires opencl-clang 11

$ ldd /usr/local/lib/libigdfcl.so.1
        linux-vdso.so.1 (0x00007ffd918c8000)
        libopencl-clang.so.11 => /usr/local/lib/libopencl-clang.so.11 (0x00007f0966e00000)

while package from ubuntu repository requires opencl-clang 10

$ ldd /lib/x86_64-linux-gnu/libigdfcl.so.1
        linux-vdso.so.1 (0x00007ffd8d7e8000)
        libopencl-clang.so.10 => not found

Looks like build config issue on distro side.

@JablonskiMateusz JablonskiMateusz closed this as not planned Won't fix, can't repro, duplicate, stale Jan 24, 2023
@fcharras
Copy link
Author

fcharras commented Jan 30, 2023

Looking back at it, I think my misfortune mostly comes from having read this section of the README before any other documentation regarding the runtime, this repository being the only suggested source by the intel/llvm github repository. If distributing a fully working package on ubuntu is not the priority here, I'd suggest at least editing the README, and replacing this section with this official, up to date guide from intel that recommends using PPAs.

Frankly intel-opencl-icd package isn't a dependency of any other package in ubuntu repositories so I think it would be even be better removed from there (so that users have no choice but finding out the best source) than distributed in this state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distro Distribution specific questions
Projects
None yet
Development

No branches or pull requests

2 participants