Skip to content
This repository has been archived by the owner on Nov 1, 2021. It is now read-only.

Support GPU hotplug #1278

Open
emersion opened this issue Oct 3, 2018 · 20 comments
Open

Support GPU hotplug #1278

emersion opened this issue Oct 3, 2018 · 20 comments

Comments

@emersion
Copy link
Member

emersion commented Oct 3, 2018

This means keep scanning for GPU nodes with udev and create/destroy subbackends when they show up or go away.

We probably want to teardown everything if the main GPU goes away.


wlroots has migrated to gitlab.freedesktop.org. This issue has been moved to:

https://gitlab.freedesktop.org/wlroots/wlroots/-/issues/1278

@Oliph
Copy link

Oliph commented Apr 4, 2019

I would love to see egpu hotplug support. I will keep sending donation for that in the following months. (sorry if it is not the place for such comment)

@ascent12
Copy link
Member

ascent12 commented Apr 4, 2019

I acquired a USB displaylink connector not too long ago, which for all intents and purposes acts as a separate GPU, just lacking its own rendering capabilities.
Once I get around to getting that to work, it would solve this, but it's not something I'm actively working on right now.

@Ongy
Copy link

Ongy commented Apr 4, 2019

We did chat wit someone at XDC who was working on getting it into gnome (collabora employee iirc), we can probably use some of that work.

Iirc a lot of the weirdness is about picking the "best" main gpu

Also something I saw at work: some of the Wacom tablets ( the one I saw was a businessy signing pad with display ) use display link
So this has more of a usecase than immediately obvious

OTOH, the kernel side of displaylink isn't exactly the nicest IMO.

@Oliph
Copy link

Oliph commented Apr 4, 2019

I have seen that a couple of monts ago on Phoronix where they talk about that work in mutter https://www.phoronix.com/scan.php?page=news_item&px=GNOME-Mutter-GPU-Hotplug.

They also link to the merge: https://gitlab.gnome.org/GNOME/mutter/commit/ad7d6e4a37a6258a5de876a85859f7f57415dffa

@AndreasBackx

This comment has been minimized.

@emersion

This comment has been minimized.

@AndreasBackx

This comment has been minimized.

@ascent12
Copy link
Member

Recently #1696 was merged as a workaround, but we eventually want to add proper support to hotplugging DRM devices. It's particularly interesting for things like docking stations or other things which are putting monitors over USB 3 or thunderbolt, but also applies to external GPUs.

Just as a little bit of background

There are really 2 types of DRM devices: display controllers and rendering devices.

  • A display controller is something that can have computer monitors attached, and would go through the DRM KMS API.
  • A rendering device is obviously something we can do rendering with, e.g. via GBM + GLES/Vulkan.

When you're dealing with most desktop hardware, you're dealing with a device that does both, so naively, wlroots was designed treating the two types of devices as the same thing. On some ARM hardware, you'll find you have two different pieces of hardware doing these separately, and docking stations etc. are usually just a display controller without rendering capabilities.

The point

Hotplugging display controllers is easy, hotplugging rendering devices (that we're using) is hard.
A display controller going away is basically the same situation as a few outputs being disconnected, but a rendering device going away means that we have to tear down all of our rendering state and try and bring it up elsewhere, which is not something I want to try.

Renderer v6 actually helps a lot towards this, as it moves a lot of the rendering code outside of the backend, but there would still be a bit of extra work to get this working properly.

  • 1 DRM backend per display controller (possibly zero)
  • Make sure there is absolutely no rendering code inside of the DRM backend
  • Make wlr_session aware of the difference between display controllers and rendering devices
  • Make wlr_session listen for hotplug events and bring up and tear down DRM backends as necessary
  • Kill the compositor if our chosen rendering device goes away (unless someone is masochistic enough to get this working properly)

@emersion
Copy link
Member Author

This overall LGTM.

1 DRM backend per display controller (possibly zero)

I wonder if we should really do this. Maybe having a list of DRM nodes in the DRM backend would be better.

@ascent12
Copy link
Member

I wonder if we should really do this. Maybe having a list of DRM nodes in the DRM backend would be better.

I'd be fine with either way, personally.

@Dirbaio
Copy link

Dirbaio commented Aug 26, 2019

Another usecase for this is hybrid graphics laptops.

For example, I have a Thinkpad X1 Extreme with intel+nvidia graphics. The HDMI port is wired to the nvidia. Switching it on is required to use external monitors, and switching it off is required for good battery life on the go. It works decently with bbswitch+nouveau, but it would be awesome if you could switch it without having to restart the session.

@AndreasBackx

This comment has been minimized.

@emersion
Copy link
Member Author

This issue is unrelated to your problem. Please open a new one with all the necessary information.

@neon64
Copy link
Contributor

neon64 commented Sep 6, 2020

I am super keen to get GPU hotplugging working for display controllers (specifically, I'm motivated by the hope of completely powering off my dGPU with bbswitch when no external monitor is connected, all whilst keeping sway running - https://www.reddit.com/r/swaywm/comments/ikaxem/feature_idea_dynamically_unloading_unused_drm/ )

Is this blocked by the work on renderer v6? If so, could I potentially start experimenting with hotplugging by forking emersion's swapchain branch. My experience in this area is limited, but with some guidance I'd love to be able to contribute a MVP patch

@emersion
Copy link
Member Author

emersion commented Sep 7, 2020

Is this blocked by the work on renderer v6?

No. Just need to listen to udev signals and setup a new DRM child backend on hotplug.

@neon64
Copy link
Contributor

neon64 commented Oct 7, 2020

I'm excited to be able to report that I've got some form of hotplugging working. It's for "display controllers" only, and only tested so far on my laptop with intel+nouveau dual-GPU. I don't think my implementation is at all merge-able yet, this is my first time delving into the wlroots codebase so I don't understand how all the pieces fit together, and so I've done many dodgy things in the implementation. But in case anyone is looking for an interim solution, particularly on laptop w hybrid graphics, feel free to try this out and let me know what issues you encounter.

https://github.com/neon64/wlroots/tree/feature/unload_drm
and please see usage instructions in the commit message of neon64@76268d4

@J0nnyMak0
Copy link
Contributor

I'm excited to be able to report that I've got some form of hotplugging working. It's for "display controllers" only, and only tested so far on my laptop with intel+nouveau dual-GPU. I don't think my implementation is at all merge-able yet, this is my first time delving into the wlroots codebase so I don't understand how all the pieces fit together, and so I've done many dodgy things in the implementation. But in case anyone is looking for an interim solution, particularly on laptop w hybrid graphics, feel free to try this out and let me know what issues you encounter.

https://github.com/neon64/wlroots/tree/feature/unload_drm
and please see usage instructions in the commit message of neon64@76268d4

I'd be glad to give it a spin. I have a Intel+Nouveau laptop and as far as I can tell hotplugging external monitors has been working for a while now. Reading @ascent12 's description, I think I have an idea of the sorts of things to look for, but if you can spell out the changes I should be expecting, I'd be happy to try them.

@neon64
Copy link
Contributor

neon64 commented Oct 8, 2020

So the specific workflow is described in this commit message neon64@76268d4

The general idea is, once you disconnect all external monitors, wlroots will stop using the nouveau driver, so you can power off the dGPU completely while keeping the compositor running (previously you had to close all windows and restart) Similarly, if you started sway with only the Intel driver loaded, run modprobe nouveau and wlroots will start detecting external monitors etc... (previously you had to restart to redetect).

Why not keep nouveau loaded all the time? As far as I'm aware, there is no way to prevent the nouveau driver from sucking up an extra ~15W even when idle - completely switching off the card is the most foolproof way. Before this patch, I'd have to restart sway to go from 16-20W to 3-5W idle. After this patch, I can vastly improve battery life without restarting my wm all the time. I guess this is more or less just a workaround for nouveau not doing proper power management, but I wouldn't even know where to start with hacking on nouveau so have come here instead.

This may be also useful for an eGPU setup, but I don't have one to test with unfortunately.

@J0nnyMak0
Copy link
Contributor

J0nnyMak0 commented Oct 8, 2020

ok, thanks. I gave it a quick try. Here is an issue I found. (I'm booting with nouveau blacklisted.)

First off, if I first plugin external monitors and then sudo modprobe nouveau, then it works great. The scan is completed and my external monitors come to life. Now I can unplug the monitors and sudo rmmod nouveau and everything works as expected. I then proceed to switch off the card with bbswitch. All good.

However, if I load nouveau module without external monitors connected, then there is a problem. Nouveau is loaded as expected, but then when I go to rmmod nouveau, it fails with: rmmod: ERROR: Module nouveau is in use

lsof tells me:

$ lsof /dev/dri/card*                                                                                                                                                                                                            
COMMAND PID   USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
sway    605 jonm  mem    CHR  226,1          28837 /dev/dri/card1
sway    605 jonm    8u   CHR  226,0      0t0  2217 /dev/dri/card0
sway    605 jonm    9u   CHR  226,0      0t0  2217 /dev/dri/card0
sway    605 jonm   10u   CHR  226,0      0t0  2217 /dev/dri/card0
sway    605 jonm   11u   CHR  226,0      0t0  2217 /dev/dri/card0
sway    605 jonm   50u   CHR  226,1      0t0 28837 /dev/dri/card1
sway    605 jonm   52u   CHR  226,1      0t0 28837 /dev/dri/card1
sway    605 jonm   53u   CHR  226,1      0t0 28837 /dev/dri/card1
sway    605 jonm   54u   CHR  226,1      0t0 28837 /dev/dri/card1
sway    605 jonm   55u   CHR  226,1      0t0 28837 /dev/dri/card1

sway.log

Edit: The problem goes away if I turn off the dGPU before loading nouveau module.

@neon64
Copy link
Contributor

neon64 commented Oct 10, 2020

@J0nnyMak0 thanks so much for trying this out. I've gone ahead and created a draft PR to discuss this specific implementation #2423 - so as to not further pollute this issue thread.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

8 participants