Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition during start up leading to blank screen #77

Closed
bedaberner opened this issue Feb 14, 2022 · 4 comments
Closed

Race condition during start up leading to blank screen #77

bedaberner opened this issue Feb 14, 2022 · 4 comments
Assignees

Comments

@bedaberner
Copy link

  • Did you install egpu-switcher via ppa or via git + make: PPA
  • What Linux distribution (+ version) are you using: Ubuntu 20.04 LTS
  • What brand / model is your laptop: Thinkpad X1 carbon gen 9
  • What brand / model is your GPU (+ enclosure):Gainward GeForce RTX 3090 Phoenix (Razer Chroma)
  • What drivers (+ version) are you using: Nvidia 510.47.03
  • What Desktop-Environment do you use (+ Display-Manager) Gnome (Xorg)

After installing egpu switcher, it would work every so often (maybe 5-10% of all cases) but booting with the egpu attached would mostly lead to a black screen with a blinking underscore or to the following error:
[drm:intel_cpu_fifo_underrun_irq_handler [i915]] ERROR CPU pipe A FIFO underrun
after trying out different approaches i found out, that decativating the systemd service and manually setting the egpu using egpu-switcher switch egpu worked.

Since i suspected a race condition I added
ExecStartPre=/bin/sleep 1
to the systemd service and that resolved my problems. It is a bit of a hacky solution so I would prefer something more proper but so far it is working very reliably.

I have seen that you reccomend to enable Pre-Boot ACL to get rid of such race conditions but that option was not available in my BIOS (my machine has Thunderbolt 4 and i could not find any of the Thunderbolt settings i see mentioned here).

@hertg
Copy link
Owner

hertg commented Feb 19, 2022

There is some code in egpu-switcher already to fight potential race conditions. It may be possible to fix your issue by extending the amount of detection retries on system bootup 🤔

If you have the time, can you try out removing your ExecStartPre from the systemd unit, and instead change the retry value in the egpu-switcher script? Try changing the value on this line from 6 to 12 or anything else higher than 6. Does that also solve your problem?

This code is a hacky way to address the race condition, by extending the detection retries to 3 seconds 6 x 0.5 seconds. Maybe 3 seconds just isn't enough in your particular configuration.


To address the race conditions in a less hacky way, egpu-switcher would need to get notified about events from boltctl and reliably block X11 from starting before boltctl fully initalized. I dug down that rabbit hole a while ago and figured that something could be doable via dbus integration, but I wasn't able to find a reliable way to check whether boltctl "initialized completely".

But to integrate egpu-switcher more deeply with boltctl, I'd prefer to rewrite the script in a more maintainable language than a bash script. I thought about rewriting the script and extending its functionality in Rust, but I'm working on some other projects too and didn't have enough time to finish up a prototype yet. Despite it being a bit hacky, it seems to work for a lot of people, so I didn't prioritize the race condition issue of egpu-switcher very highly.

@bedaberner
Copy link
Author

Hi hertg
Thx for your reply, i tried doing what you suggested with values as high as 64 but the behavior did not change at all for me.

The Workaround with ExecStartPre works for me and i am not in need of a solution, I just chose to post it here so you are aware and for other people with the same problem.

@hertg
Copy link
Owner

hertg commented Feb 21, 2022

Hi hertg Thx for your reply, i tried doing what you suggested with values as high as 64 but the behavior did not change at all for me.

Hmm, that's curious. I'm guessing incrementing the value didn't delay the startup on your system? Or did it actually delay the start of the graphical environment for 32 seconds? Would be great if you could post your full systemd unit (cat /etc/systemd/system/egpu.service).

The Workaround with ExecStartPre works for me and i am not in need of a solution, I just chose to post it here so you are aware and for other people with the same problem.

Sounds good, I'll leave the issue open for now. If you'd like to submit a PR that adds your issue+workaround to the TROUBLESHOOT.md file, I'll happily accept that :)

@bedaberner
Copy link
Author

Incrementing the Value did not really have any observable impact.

My systemd unit looks like this:

Description=EGPU Service
Before=display-manager.service
After=bolt.service

[Service]
Type=oneshot
ExecStartPre=/bin/sleep 1
ExecStart=/usr/bin/egpu-switcher switch auto

[Install]
WantedBy=graphical.target

With `ExecStartPre=/bin/sleep 1``being the only change compared to the standard one.

I obviously commented this line out when testing your suggestion

If you'd like to submit a PR that adds your issue+workaround to the TROUBLESHOOT.md file, I'll happily accept that :)

I'll do that when i find the time

Repository owner locked and limited conversation to collaborators Sep 14, 2022
@hertg hertg converted this issue into discussion #88 Sep 14, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants