Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IPC error] sof-test/multiple-pause-resume.sh reported error when dma trace was disabled on TGL #3673

Closed
RanderWang opened this issue Dec 7, 2020 · 17 comments
Labels
bug Something isn't working as expected IPC error IPC error is observed P1 Blocker bugs or important features SDW SoundWire TGL Applies to Tiger Lake xrun XRUN is observed and firmware may not recover

Comments

@RanderWang
Copy link
Collaborator

RanderWang commented Dec 7, 2020

Describe the bug
[ 55.864574] sof-audio-pci 0000:00:1f.3: ipc tx: 0x60070000: GLB_STREAM_MSG: TRIG_RELEASE
[ 55.864773] sof-audio-pci 0000:00:1f.3: ipc tx succeeded: 0x60070000: GLB_STREAM_MSG: TRIG_RELEASE
[ 56.033318] sof-audio-pci 0000:00:1f.3: pcm: trigger stream 1 dir 1 cmd 3
[ 56.033332] sof-audio-pci 0000:00:1f.3: ipc tx: 0x60060000: GLB_STREAM_MSG: TRIG_PAUSE
[ 56.033556] sof-audio-pci 0000:00:1f.3: error: ipc error for 0x60060000 size 12, err -22
[ 56.033592] sof-audio-pci 0000:00:1f.3: FW Poll Status: reg=0x20140000 successful
[ 56.033610] sof-audio-pci 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_trigger on 0000:00:1f.3: -22
[ 56.033628] Headset mic: ASoC: trigger FE cmd: 3 failed: -22
[ 56.033650] sof-audio-pci 0000:00:1f.3: pcm: trigger stream 1 dir 1 cmd 4
[ 56.033984] sof-audio-pci 0000:00:1f.3: FW Poll Status: reg=0x2014001e successful
[ 56.033999] sof-audio-pci 0000:00:1f.3: ipc tx: 0x60070000: GLB_STREAM_MSG: TRIG_RELEASE
[ 56.034232] sof-audio-pci 0000:00:1f.3: error: ipc error for 0x60070000 size 12, err -22
[ 56.034244] sof-audio-pci 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_trigger on 0000:00:1f.3: -22
[ 56.034252] Headset mic: ASoC: trigger FE cmd: 4 failed: -22

DMA trace error:
TIMESTAMP DELTA C# COMPONENT LOCATION CONTENT
[ 1025566.458333] ( 1025566.437500) c0 dw-dma src/drivers/dw/dma.c:1095 ERROR dw_dma_get_data_size(): xrun detected
[ 1025575.052083] ( 8.593750) c0 dai 2.10 src/audio/dai.c:785 ERROR dai_report_xrun(): overrun due to no space available
[ 1025582.708333] ( 7.656250) c0 dai 2.10 src/audio/dai.c:684 ERROR comp_overrun(): sink->free = 0, copy_bytes = 0
[ 1025615.260417] ( 32.552082) c0 pipe 2.11 src/audio/pipeline.c:1029 ERROR pipeline_copy(): ret = -61, start->comp.id = 10, dir = 0
[ 1025625.260417] ( 10.000000) c0 pipe 2.11 src/audio/pipeline.c:1219 ERROR pipeline_task(): xrun recover failed! pipeline will be stopped!
[ 1893947.604167] ( 868322.375000) c0 dai 2.10 src/audio/component.c:209 ERROR comp_set_state(): wrong state = 1, COMP_TRIGGER_STOP
[ 1893958.177083] ( 10.572917) c0 pipe 2.11 src/audio/pipeline.c:884 ERROR pipeline_trigger(): ret = -22, host->comp.id = 6, cmd = 0
[ 1893968.958333] ( 10.781250) c0 ipc src/ipc/handler.c:441 ERROR ipc: comp 6 trigger 0x50000 failed -22

To Reproduce
on TGL device, disable dma trace in kernel driver and run sof-test/test-case/multiple-pause-resume.sh

Reproduction Rate
etrace.txt
dmesg.txt

100%

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).

  2. Name of the platform(s) on which the bug is observed.

    • Platform: TGL-U RVP with ALC711 and ALC1308 in SDW mode by default (no HW rework is needed), TGL Chrome device Volteer with RT5682 and Max98373 in SDW mode.
@RanderWang RanderWang added bug Something isn't working as expected TGL Applies to Tiger Lake labels Dec 7, 2020
@mengdonglin mengdonglin added the P1 Blocker bugs or important features label Dec 8, 2020
@mengdonglin
Copy link
Collaborator

@keqiaozhang will check if this issue can also be reproduced on TGL Chrome device with I2S audio

@slawblauciak slawblauciak self-assigned this Dec 8, 2020
@keqiaozhang
Copy link
Collaborator

keqiaozhang commented Dec 8, 2020

Tried sof-dev + tgl-011-drop-stable branch on TGL Chrome-I2S, no reproduction so far(50+ iterations passed), only observed some xrun errors in error trace when testing "multiple-pause-resume.sh". will perform more tests.

[1763781595.156250] (     2012.031250) c0 hda-dma            ..../intel/hda/hda-dma.c:865  ERROR hda_dma_link_check_xrun(): underrun detected
[1763789650.208333] (     8055.052246) c0 hda-dma            ..../intel/hda/hda-dma.c:865  ERROR hda_dma_link_check_xrun(): underrun detected
[1763793618.489583] (     3968.281250) c0 hda-dma            ..../intel/hda/hda-dma.c:865  ERROR hda_dma_link_check_xrun(): underrun detected
[1764379567.500000] (   585949.000000) c0 hda-dma            ..../intel/hda/hda-dma.c:865  ERROR hda_dma_link_check_xrun(): underrun detected

Edited: No reproductions on TGL-Chrome-I2S, tested with "./multiple-pause-resume.sh -r 1000" and passed.

@bardliao
Copy link
Collaborator

@slawblauciak @RanderWang I think the issue is that we don't handle xrun on firmware. ie. #define NO_XRUN_RECOVERY 1
I can see below sof-logger if I add pipe_err(p, "No xrun_recover"); in pipeline_xrun_recover

[104075289.843750] (  2812062.500000) c0 dw-dma                 src/drivers/dw/dma.c:1101 ERROR dw_dma_get_data_size(): xrun detected
[104075308.020833] (       18.177084) c0 dai          4.23           src/audio/dai.c:788  ERROR dai_report_xrun(): overrun due to no space available
[104075324.635417] (       16.614584) c0 dai          4.23           src/audio/dai.c:685  ERROR comp_overrun(): sink->free = 224, copy_bytes = 0
[104075415.312500] (       90.677086) c0 pipe         4.24      src/audio/pipeline.c:1029 ERROR pipeline_copy(): ret = -61, start->comp.id = 23, dir = 0
[104075437.291667] (       21.979166) c0 pipe         4.24      src/audio/pipeline.c:1143 ERROR No xrun_recover
[104075457.187500] (       19.895834) c0 pipe         4.24      src/audio/pipeline.c:1220 ERROR pipeline_task(): xrun recover failed! pipeline will be stopped!
diff --git a/src/audio/pipeline.c b/src/audio/pipeline.c
index 345773b7d..99617baf8 100644
--- a/src/audio/pipeline.c
+++ b/src/audio/pipeline.c
@@ -1140,6 +1140,7 @@ void pipeline_xrun(struct pipeline *p, struct comp_dev *dev,
 /* recover the pipeline from a XRUN condition */
 static int pipeline_xrun_recover(struct pipeline *p)
 {
+       pipe_err(p, "No xrun_recover");
        return -EINVAL;
 }

@xiulipan
Copy link
Contributor

@aiChaoSONG Is #3714 a duplicate with this?
@bardliao @RanderWang I think we do not have XRUN recovery at all now. All XRUN will break pipeline and XRUN is not tolerated now. But I remember we have some exception in WoV pipelines @keyonjie could you clarify here.

@aiChaoSONG
Copy link
Collaborator

aiChaoSONG commented Dec 25, 2020

@xiulipan #3714 is reproduced in our daily test, I don't think dma trace is disabled in the default kernel, but issue also reproduced.

@bardliao
Copy link
Collaborator

@xiulipan @aiChaoSONG IMHO, #3714 and #3673 are the same issue. Disabling DMA trace is just a way to reproduce/trigger the XRUN.

@mengdonglin mengdonglin added xrun XRUN is observed and firmware may not recover IPC error IPC error is observed labels Jan 5, 2021
@mengdonglin mengdonglin added the SDW SoundWire label Jan 19, 2021
@ranj063
Copy link
Collaborator

ranj063 commented Jan 19, 2021

@bardliao the ipc error is because the FW stops the pipeline in case of an xrun and then when the application stops the PCM after the xrun, the state is incorrect. This needs to be fixed @keyonjie FYI.

But why does the xrun happen in the first place?

@bardliao
Copy link
Collaborator

But why does the xrun happen in the first place?

I have no idea so far.

@lgirdwood
Copy link
Member

@mwasko fyi

@slawblauciak
Copy link
Collaborator

@RanderWang is the issue still reproducible on the latest master branch?

@RanderWang
Copy link
Collaborator Author

@slawblauciak very sorry, so late. I tested FW in today CI build. The XRUN error in FW appears after a few cycles of test instead of occurring immediately.

@plbossart
Copy link
Member

@RanderWang is this issue still relevant? It's not tracked in CI or daily tests.

@RanderWang
Copy link
Collaborator Author

@RanderWang is this issue still relevant? It's not tracked in CI or daily tests.

This is a hint or fast way to trigger multiple-pause-resume issue, not a root reason. In my opinion, it is caused by schedule in FW

@abonislawski
Copy link
Member

@RanderWang could you please check if this issue is still valid?

@lgirdwood
Copy link
Member

@XiaoyunWu6666 can you comment if you still see this issue ? if not we can close.

@RanderWang
Copy link
Collaborator Author

@RanderWang could you please check if this issue is still valid?

I will check it. too old

@RanderWang
Copy link
Collaborator Author

I checked with latest kernel commit ddcea4bef6 and fw 0bf7b73 today. I can't reproduce the bug. Let's close it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected IPC error IPC error is observed P1 Blocker bugs or important features SDW SoundWire TGL Applies to Tiger Lake xrun XRUN is observed and firmware may not recover
Projects
None yet
Development

No branches or pull requests