Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clinfo on WSL2 using A770 causes BSOD #663

Closed
maleadt opened this issue Jul 20, 2023 · 9 comments
Closed

clinfo on WSL2 using A770 causes BSOD #663

maleadt opened this issue Jul 20, 2023 · 9 comments
Labels

Comments

@maleadt
Copy link

maleadt commented Jul 20, 2023

I'm using the following set-up:

  • Windows 10 22H2 (also tested 22H1)
  • 64-bit x86 (i5-6600K) with an Arc A770 (ReBAR not enabled/supported on this system, but Above 4G decoding is enabled)
  • Latest beta host driver (31.0.101.4575, but also tested the latest WHQL)
  • freshly set-up WSL2 Ubuntu (kernel 5.15.90.1-microsoft-standard-WSL2)
  • lateste compute-runtime (23.22.26516.18; using the binaries from GitHub)

Doing clinfo in a WSL2 terminal starts printing some output, but quickly triggers a BSOD that mentions dxgmms2.sys and SYSTEM_THREAD_EXCEPTION_NOT_HANDLED. The generated dump file is corrupt, so I couldn't inspect it.

I'm also encountering this BSOD when loading oneAPI.jl, presumably when the first call to Level Zero happens (i.e. zeInit).

@JablonskiMateusz
Copy link
Contributor

Hi @maleadt could you please ensure that if you remove intel.icd from /etc/OpenCL/vendors in the WSL and run clinfo, then there is no BSOD? I would like to confirm that the BSOD is related to our package

@maleadt
Copy link
Author

maleadt commented Jul 20, 2023

Yes, when I hadn't installed an Intel ICD (i.e. before installing any of the compute-runtime packages) clinfo just returned 0 platforms.

@JablonskiMateusz
Copy link
Contributor

Thanks for confirmation.

freshly set-up WSL2 Ubuntu

which Ubuntu version?

@maleadt
Copy link
Author

maleadt commented Jul 20, 2023

which Ubuntu version?

The one Microsoft defaults to, which seems to be 22.04 (all packages fully updated).

@eero-t
Copy link

eero-t commented Feb 28, 2024

Do you mean that there's Windows kernel BSOD when trivial operations are done with Linux user-space compute stack under WSL? (Does not really sound like compute-runtime problem)

@maleadt
Copy link
Author

maleadt commented Feb 28, 2024

Do you mean that there's Windows kernel BSOD when trivial operations are done with Linux user-space compute stack under WSL?

Correct.

@eero-t
Copy link

eero-t commented Feb 28, 2024

Unless you're using PCI passthrough for the device, and mention of "dgx" (directX) is just co-incidence, I do not see how this could be Linux side compute-runtime problem, rather than Windows driver issue. Have you reported the issue to Windows side?

Virtual machine using virtualized host drivers should not be able to BSOD the host...

@maleadt
Copy link
Author

maleadt commented Mar 11, 2024

After recently updating both NEO/IGC, this issue doesn't occur anymore. Sadly, the native Intel GPU driver was updated too, so it's impossible to tell which updated fixed the issue... Anyway, I'm glad it got fixed, so this can be closed.

@eero-t
Copy link

eero-t commented Mar 11, 2024

Thanks for testing, reporting the results here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants