Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6.10 kernel: i915 driver crash in system suspend due to wakeref from SOF code #5120

Open
rfvirgil opened this issue Jul 23, 2024 · 9 comments
Labels
i915 i915 related issue Unclear No agreement on problem statement and resolution

Comments

@rfvirgil
Copy link

rfvirgil commented Jul 23, 2024

6.10 kernel
UpXtreme TGL + CDB35L56-EIGHT-C (SoundWire)
Play audio, then system-suspend while audio is still playing.

We're seeing that with the 6.10 kernel there is a WARN triggered in the i915 driver, intel_runtime_pm_driver_release() because there is a wakeref that has not been released. This happens if audio was playing when the system entered system-suspend. For example:

play audio.wav &
sudo rtcwake -m mem -s 5

I enabled the debugging in the i915 driver so that wakeref requests are tracked, and printed out the references when the WARN is triggered, and I get the reftracking info below (full dmesg attached dmesg.txt)

So, snd_sof has taken a wakeref on the i915 driver during probe, but has not released it when the system went to system-suspend. The i915 driver expects that all wakerefs have been released before system-suspending.

[   72.906688] PM: suspend entry (deep)
[   72.910341] Filesystems sync: 0.003 seconds
[   72.913001] Freezing user space processes
[   72.915545] Freezing user space processes completed (elapsed 0.002 seconds)
[   72.915613] OOM killer disabled.
[   72.915617] Freezing remaining freezable tasks
[   72.917124] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[   72.917333] printk: Suspending console(s) (use no_console_suspend to debug)
[   73.255773] cirrus-reset-owner 1013BAAD:00: Suspend
[   73.546490] ref_tracker: 0000:00:02.0@0000000000296b48 has 1/1 users at
                    __intel_runtime_pm_get+0x72/0xf0 [i915]
                    intel_display_power_get+0x27/0x70 [i915]
                    i915_audio_component_get_power+0x20/0x190 [i915]
                    snd_hdac_display_power+0x85/0x130 [snd_hda_core]
                    hda_codec_i915_init+0x3a/0x50 [snd_sof_intel_hda]
                    hda_dsp_probe_early+0x167/0x240 [snd_sof_intel_hda_generic]
                    snd_sof_device_probe+0x24b/0x370 [snd_sof]
                    sof_pci_probe+0x12f/0x200 [snd_sof_pci]
                    local_pci_probe+0x42/0xa0
                    pci_device_probe+0xc7/0x240
                    really_probe+0xd3/0x380
                    __driver_probe_device+0x78/0x150
                    driver_probe_device+0x1f/0x90
                    __driver_attach+0xd6/0x1d0
                    bus_for_each_dev+0x8c/0xe0
                    bus_add_driver+0x115/0x240

[   73.546596] ------------[ cut here ]------------
[   73.546599] i915 0000:00:02.0: i915 raw-wakerefs=1 wakelocks=1 on cleanup
[   73.546648] WARNING: CPU: 0 PID: 573 at drivers/gpu/drm/i915/intel_runtime_pm.c:447 intel_runtime_pm_driver_release+0x66/0x80 [i915]
@rfvirgil
Copy link
Author

@plbossart FYI

@plbossart
Copy link
Member

@rfvirgil can you clarify which endpoint was used? I would guess it's one of the HDMI endpoints but the description refers to the CS board.

I am afraid I don't fully get the i915 flows, clearly we don't do anything suspending, hda_codec_i915_display_power() is only called in the resume sequence.

this will most likely have to wait for @kv2019i to look at this...

@plbossart
Copy link
Member

@ujfalusi FYI since you are in the middle of HDMI debugging.

@plbossart plbossart added the i915 i915 related issue label Jul 23, 2024
@rfvirgil
Copy link
Author

@plbossart What do you mean by "endpoint" ? We're playing audio to the CS35L56 on the EIGHT-C board.
HDMI is connected to a monitor for console display, but we aren't using it for audio.
Also, we aren't using graphics. The UpX doesn't have X or Wayland installed.

As a workaround I've set snd_hda_core.gpu_bind=0

I did start looking at the way the wakeref was implemented in i915 and snd-hda, but it was too complex for me to understand...

I've found hdac_hdmi_runtime_suspend(), which seems to be releasing the wakeref when it runtime-suspends. I'm guessing that the hdac code isn't handling the case where you system-suspend while still playing audio, so you were not runtime-suspended.

@ujfalusi
Copy link
Collaborator

I cannot reproduce it on upx (running in HDA mode, nothing fancy connected), neither on SDW device which have HDMI (and panel, which might change things). I'm not sure what can cause this atm

@plbossart
Copy link
Member

I've found hdac_hdmi_runtime_suspend(), which seems to be releasing the wakeref when it runtime-suspends. I'm guessing that the hdac code isn't handling the case where you system-suspend while still playing audio, so you were not runtime-suspended.

the system suspend in hdac_hdmi.c seems to rely on pm_runtime_force_suspend(). I am not sure what it does and why this would matter anyways if you are not playing audio on HDMI. the entire i915 audio path should be disabled and powered-down.

Unless there's the silly keep-alive function that keeps HDMI running?

@ujfalusi
Copy link
Collaborator

Unless there's the silly keep-alive function that keeps HDMI running?

Hrm, I have the silent scream stream disabled on all of my devices...

@ujfalusi
Copy link
Collaborator

ujfalusi commented Jul 24, 2024

I cannot say what is going on...

On UPX-i11 HDA: it is OK under all combination

With SDW device (MTL): gives i915 raw-wakerefs=1 wakelocks=1 on cleanup under any combination, regardless of silent steam or playback

With SDW device (TGL-H): it is OK under all combination

🤷

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 29, 2024

Unless there's the silly keep-alive function that keeps HDMI running?

In a totally different i915 issue I recently discovered that DRM is polling for monitor hotplug by default even when everything is off. It's configured by /sys/module/drm_kms_helper/parameters/poll/intel_dp_detect

You can see it with options drm debug=0x2, example:

Jul 17 16:20:59 up2 kernel: i915 0000:00:02.0: [drm:intel_runtime_resume [i915]] Resuming device
Jul 17 16:20:59 up2 kernel: i915 0000:00:02.0: [drm:intel_runtime_resume [i915]] Device resumed
Jul 17 16:21:11 up2 kernel: i915 0000:00:02.0: [drm:intel_runtime_suspend [i915]] Suspending device
Jul 17 16:21:11 up2 kernel: i915 0000:00:02.0: [drm:intel_runtime_suspend [i915]] Device suspended
Jul 17 16:21:21 up2 kernel: i915 0000:00:02.0: [drm:intel_runtime_resume [i915]] Resuming device
Jul 17 16:21:21 up2 kernel: i915 0000:00:02.0: [drm:intel_runtime_resume [i915]] Device resumed

Sorry if off-topic, I thought it was worth sharing.

@plbossart plbossart added the Unclear No agreement on problem statement and resolution label Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i915 i915 related issue Unclear No agreement on problem statement and resolution
Projects
None yet
Development

No branches or pull requests

4 participants