Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Successive 46 times 4ch DMIC captures crashes firmware #487

Closed
singalsu opened this issue Oct 12, 2018 · 14 comments
Closed

Successive 46 times 4ch DMIC captures crashes firmware #487

singalsu opened this issue Oct 12, 2018 · 14 comments
Assignees
Labels
APL Applies to Apollolake platform bug Something isn't working as expected GLK Applies to Gemini Lake platform P1 Blocker bugs or important features verified QA/Bug reporter verifies solution worked.

Comments

@singalsu
Copy link
Collaborator

A loop of "arecord -Dhw:0,6 -c 4 -f S32_LE -r 48000 -t raw -d 1 capture.raw" done 46 times causes a crash.

The actual number of loops may vary depending on platform, topology, etc. but that's what I see currently in UP2 board (APL). I'll assign this to myself and provide more information to repeat this if I need help.

Need to check if this happens also with 2ch.

@singalsu singalsu self-assigned this Oct 12, 2018
@cujomalainey
Copy link
Contributor

Also witnessed a crash of the DSP via arecord on the V0.5 drop using 2ch on glk

@cujomalainey
Copy link
Contributor

Excerpt of initial crash

[  674.462082] sof-audio sof-audio: firmware boot complete
[  685.666436] sof-audio sof-audio: error: ipc error for 0x60010000 size 0x14
[  685.666447] sof-audio sof-audio: ASoC: sof-audio hw params failed: -22
[  685.666452]  Speakers: ASoC: hw_params FE failed -22
[  685.666791] sof-audio sof-audio: error: ipc error for 0x60010000 size 0x14
[  685.666797] sof-audio sof-audio: ASoC: sof-audio hw params failed: -22
[  685.666800]  Speakers: ASoC: hw_params FE failed -22
[  686.975052] sof-audio sof-audio: error: ipc timed out for 0x60040000 size 0xc
[  686.975065] sof-audio sof-audio: status: fw entered - code 00000005
[  686.975189] sof-audio sof-audio: error: unexpected fault 0x00000000 trace 0x00004000
[  686.975193] sof-audio sof-audio: error: waking up any trace sleepers
[  686.975197] sof-audio sof-audio: XRUN for unknown stream, msg_id 0
[  686.975202]  Speakers: ASoC: trigger FE failed -110
[  686.975214] sof-audio sof-audio: XRUN for unknown stream, msg_id 0
[  686.976258] sof-audio sof-audio: XRUN for unknown stream, msg_id 0
[  690.207291] sof-audio sof-audio: error : DSP panic!
[  690.207309] sof-audio sof-audio: status: fw entered - code 00000005
[  690.207472] sof-audio sof-audio: error: can't enter idle
[  690.207475] sof-audio sof-audio: error: trace point 00004000
[  690.207479] sof-audio sof-audio: error: DSP Firmware Oops
[  690.207482] sof-audio sof-audio: error: Exception Cause: IllegalInstructionCause, Illegal instruction
[  690.207486] sof-audio sof-audio: EXCCAUSE 0x00000000 EXCVADDR 0x00000000 PS       0x00060725 SAR     0x00000005
[  690.207490] sof-audio sof-audio: EPC1     0xbe0230b9 EPC2     0xbe010bee EPC3     0xbe010bee EPC4    0x00000000
[  690.207494] sof-audio sof-audio: EPC5     0xbe01cbe5 EPC6     0x00000000 EPC7     0x00000000 DEPC    0x00000000
[  690.207497] sof-audio sof-audio: EPS2     0x00060720 EPS3     0x00060720 EPS4     0x00000000 EPS5    0x00060120
[  690.207501] sof-audio sof-audio: EPS6     0x00000000 EPS7     0x00000000 INTENABL 0x00010198 INTERRU 0x00000222
[  690.207504] sof-audio sof-audio: stack dump from 0xbe07fec0
[  690.207508] sof-audio sof-audio: 0xbe07fec0: 0xbe015142 0xbe07ff50 0xbe04c858 0xbe04d6a4
[  690.207511] sof-audio sof-audio: 0xbe07fec4: 0xfe0228b4 0xbe07ff10 0xbe0410d0 0x00000001
[  690.207515] sof-audio sof-audio: 0xbe07fec8: 0x00000000 0x00004b00 0xbe00361c 0x00000000
[  690.207519] sof-audio sof-audio: 0xbe07fecc: 0xfe0228b4 0xbe07ff20 0xbe05ab74 0xbe05ab50
[  690.207522] sof-audio sof-audio: 0xbe07fed0: 0x7e010bee 0xbe07ff40 0x00030082 0x00000005
[  690.207526] sof-audio sof-audio: 0xbe07fed4: 0x00001604 0xbe07ff40 0x00030082 0x00000005
[  690.207529] sof-audio sof-audio: 0xbe07fed8: 0x00000001 0x00000016 0x016e6263 0xbe05d718
[  690.207533] sof-audio sof-audio: 0xbe07fedc: 0xbe010b11 0xbe07ffa0 0x00000000 0x00000001

@xiulipan
Copy link
Contributor

@singalsu @cujomalainey

[ 690.207291] sof-audio sof-audio: error : DSP panic!
[ 690.207309] sof-audio sof-audio: status: fw entered - code 00000005
[ 690.207472] sof-audio sof-audio: error: can't enter idle

This panic happens on

static uint64_t validate(void *data, uint64_t delay)
{
        struct sa *sa = data;
        uint64_t current;
        uint64_t delta;

        current = platform_timer_get(platform_timer);
        delta = current - sa->last_idle;

        /* were we last idle longer than timeout */
        if (delta > sa->ticks) {
                trace_sa("tim");
                trace_sa_value(delta);
                panic(SOF_IPC_PANIC_IDLE);
        }

        return PLATFORM_IDLE_TIME;
}
  1. work_schedule has some error.
    The validate work is a bit late and make agent timeout.
  2. DSP is to busy and always in IRQ
    Can not enter the idle status.

@singalsu
Copy link
Collaborator Author

The crash does not happen if there's a "sleep 2" between captures. The suspend-resume cycle likely clears the issue in FW.

I tried to disable the gain ramp code that I suspected but it didn't prevent the crash. I also read trough the alloc & free code in dmic.c but I could not spot any hazards.

@singalsu
Copy link
Collaborator Author

singalsu commented Oct 15, 2018

The errors trace looks like this after the crash has happened:

 CORE  LEVEL  COMP_ID        TIMESTAMP            DELTA                FILE_NAME	CONTENT
    0      2      IPC  46518382.604167  46518384.000000            apl-ipc.c:111 	Cmd
    0      2      IPC  46518386.718750         4.114583            handler.c:395 	tri
    0      2     PIPE  46518389.843750         3.125000           pipeline.c:753 	cmd
    0      2     HOST  46518392.708333         2.864583               host.c:277 	trg
    0      2     HOST  46518395.885417         3.177083            hda-dma.c:441 	DDi
    0      2      DAI  46518399.218750         3.333333                dai.c:520 	trg
    0      2      DAI  46518401.770833         2.552083                dai.c:591 	tsp
    0      2      DMA  46518404.479167         2.708333             dw-dma.c:548 	DDi
    0      2     DMIC  46518409.791667         5.312500               dmic.c:1382	tri
    0      2     DMIC  46518412.343750         2.552083               dmic.c:1327	sto
    0      2     DMIC  46518415.625000         3.281250               dmic.c:1341	sre
    0      2      IPC  46518760.364583       344.739594            apl-ipc.c:111 	Cmd
    0      2      IPC  46518764.479167         4.114583            handler.c:293 	SFr
    0      2     PIPE  46518767.656250         3.177083           pipeline.c:836 	PRe
    0      2     HOST  46518770.572917         2.916667               host.c:609 	res
    0      2     HOST  46518773.333333         2.760417            hda-dma.c:441 	DDi
    0      2      DAI  46518780.052083         6.718750                dai.c:460 	res
    0      2      IPC  46544484.375000     25704.322266            apl-ipc.c:111 	Cmd
    0      2      IPC  46544488.437500         4.062500            handler.c:182 	SAl
    0      2     PIPE  46544491.927083         3.489583           pipeline.c:803 	Par
    0      2     HOST  46544495.312500         3.385417               host.c:446 	host-params
    0      2     HOST  46544501.822917         6.510417            hda-dma.c:303 	Dgt
    0      2     HOST  46544505.052083         3.229167            hda-dma.c:485 	Dsc
    0      2      DAI  46544509.687500         4.635417                dai.c:355 	par
    0      2      DMA  46544512.656250         2.968750             dw-dma.c:322 	Dgt
    0      2     PIPE  46544520.000000         7.343750           pipeline.c:598 	pre
    0      2     HOST  46544522.708333         2.708333               host.c:547 	pre
    0      2      DAI  46544525.625000         2.916667                dai.c:412 	pre
    0      2      IPC  46545264.895833       739.270813            apl-ipc.c:111 	Cmd
    0      2      IPC  46545268.854167         3.958333            handler.c:395 	tri
    0      2     PIPE  46545271.927083         3.072917           pipeline.c:753 	cmd
    0      2     HOST  46545274.947917         3.020833               host.c:277 	trg
    0      2     HOST  46545278.229167         3.281250            hda-dma.c:359 	DEn
    0      2      DAI  46545313.906250        35.677082                dai.c:520 	trg
    0      2      DAI  46545316.458333         2.552083                dai.c:529 	tsa
    0      1      MEM  46545322.604167         6.145833              alloc.c:130 	eM1

The sequence that starts from the middle with "Cmd, SAl, Par" looks normal capture start until dai tsa. Normally there should be a dmic tri (trigger) but instead there's eM1. It's from rmalloc_sys() heap free test fail.

@lgirdwood
Copy link
Member

@mmaka1 , @tlauda any ideas on why 2 second delay helps here, it seems there is something on the DMIC shutdown path. Maybe some relationshipship between DMAC and DMIC ?

@tlauda
Copy link
Contributor

tlauda commented Oct 17, 2018

@xiulipan @lgirdwood Work queue is on lower interrupt level than DMA (2 < 5). Agent doesn't take any additional threshold between the expected value and current idle difference, so I would expect there could be potential mismatch during data transfer.

@lgirdwood
Copy link
Member

Hmm, SA should be scheduling work at highest level. i.e. 5 - Let me check this. Fwiw SA function should be rolled into scheduler logic in the future.

@mmaka1
Copy link

mmaka1 commented Oct 18, 2018

@lgirdwood @singalsu The alloc.c eM1 failure may be a mem leak I recently found exercising pause/resume scenarios. The dai stops the dw-dma which enters COMP_STATE_PREPARE state (COMP_TRIGGER_PAUSE handler calls dma_stop(), not _pause()). Then when the dai is 'released', calls dw_dma_start() which being in 'prepare' registers the interrupt handlers and calls another alloc in the sys heap. After a while fw runs out of the sys heap.
I am preparing a patch for that.

@mengdonglin mengdonglin added bug Something isn't working as expected APL Applies to Apollolake platform GLK Applies to Gemini Lake platform labels Oct 18, 2018
@lgirdwood
Copy link
Member

@singalsu patch now merged, can you retest.

@mengdonglin mengdonglin added the P1 Blocker bugs or important features label Oct 19, 2018
@mengdonglin
Copy link
Collaborator

@singalsu will your test on up2 be blocked by bug #443?

@keqiaozhang
Copy link
Collaborator

@singalsu
I tested this issue on my Yorp today and passed 500 times without error, so please try with latest sof-master at your end.
sof-master: 459ccd2

@singalsu
Copy link
Collaborator Author

@keqiaozhang @mengdonglin @lgirdwood I'm no more getting this error, the capture loop is now past iteration 200 and still works OK! Thank you @mmaka1 !!!

I'll close this issue.

@singalsu
Copy link
Collaborator Author

Closing.

@mengdonglin mengdonglin added the verified QA/Bug reporter verifies solution worked. label Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APL Applies to Apollolake platform bug Something isn't working as expected GLK Applies to Gemini Lake platform P1 Blocker bugs or important features verified QA/Bug reporter verifies solution worked.
Projects
None yet
Development

No branches or pull requests

9 participants