Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could not access debugfs when DSP panic or IPC failed #233

Closed
xiulipan opened this issue Oct 31, 2018 · 11 comments · Fixed by #237
Closed

could not access debugfs when DSP panic or IPC failed #233

xiulipan opened this issue Oct 31, 2018 · 11 comments · Fixed by #237
Assignees
Labels
APL Applies to ApolloLake platform bug Something isn't working P1 Blocker bugs or important features

Comments

@xiulipan
Copy link

When DSP panic or IPC failed. If we use sof-logger or rmbox we will see

sof-audio sof-audio: error: debugFS failed to resume -13
In dmesg.

Or if we want to access any debugfs exculde trace, the dmesg shows same thing and terminal refuse to open the file

sudo cat /sys/kernel/debug/sof/etrace
cat: /sys/kernel/debug/sof/etrace: Permission denied

Analysis:
trace and etrace used different read ops
For trace we used sof_dfsentry_read

static const struct file_operations sof_dfs_fops = {
	.open = sof_dfsentry_open,
	.read = sof_dfsentry_read,
	.llseek = default_llseek,
};

And the above dmesg is coming from

dev_err(sdev->dev, "error: debugFS failed to resume %d\n",

So the guess is here, when DSP panic or IPC failed, how would our pm_runtime_get_sync pm_runtime_put behavior?
We may need some fallback handler for this case.

The debugfs is very valuable and critical for our debug when error happens. But now it could not work.

@xiulipan
Copy link
Author

@libinyang
Could you share the workaround to get etrace on UP2?

@xiulipan xiulipan added the bug Something isn't working label Oct 31, 2018
@mengdonglin mengdonglin added P1 Blocker bugs or important features APL Applies to ApolloLake platform labels Oct 31, 2018
@mengdonglin
Copy link
Collaborator

@ranj063 This issue is observed when debugging thesofproject/sof#443 on UP2. But it should be a generic issue.

@keyonjie
Copy link

when DSP is panic, we should let reading trace debugFS entries(with old value at worst case) possible, let me try some fix to it.

@ranj063
Copy link
Collaborator

ranj063 commented Oct 31, 2018

@Keyon agree with you. Let me know if you need help

@mengdonglin
Copy link
Collaborator

mengdonglin commented Nov 1, 2018

@keyonjie please check if thesofproject/sof#518 could improve the logger health for your debugging.

@xiulipan
Copy link
Author

xiulipan commented Nov 1, 2018

@keyonjie @ranj063
What about tplg load fail?
This will also make resume fail, right?
So we may also want the debugfs can be read or just disable pm when error happen.

@keyonjie
Copy link

keyonjie commented Nov 1, 2018

@xiulipan why tplg load fail? we will not destroy the entry, so it should be still readable when resume fails.

@xiulipan
Copy link
Author

xiulipan commented Nov 1, 2018

@keyonjie
The resume fail will return error code to kernel. Thought you have copy the data to the right place, the error code will forbidden the normal read or open.
#237 is the workaround we are using now.

I will close this issues when #237 is merged.

@markyang
Copy link

Summary:
This issue can be reproduced on MinnowBoard when DSP panic.

dmesg:

[   25.122512] igb 0000:03:00.0 enp3s0: igb: enp3s0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[   25.122512] igb 0000:03:00.0 enp3s0: igb: enp3s0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[   25.123050] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0: link becomes ready
[  189.178611] random: crng init done
[  189.178987] random: 7 urandom warning(s) missed due to ratelimiting
[  244.151059] systemd-journald[218]: File /var/log/journal/eadbe628db86481c8fbc460378113bb6/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.
[12972.769703] sof-audio sof-audio: error: debugFS failed to resume -13
[13528.463143] perf: interrupt took too long (2549 > 2500), lowering kernel.perf_event_max_sample_rate to 78250
[13803.194880] sof-audio sof-audio: error: debugFS failed to resume -13

Test steps:
sudo sof-logger-2cd668c -l sof-apl.ldc-master-gcc-73475e3f
CORE LEVEL COMP_ID TIMESTAMP DELTA FILE_NAME CONTENT
sudo sof-logger-2cd668c -l sof-apl.ldc-master-gcc-73475e3f -t
CORE LEVEL COMP_ID TIMESTAMP DELTA FILE_NAME CONTENT

Test env:
sof master: 73475e3
sof tool: 2cd668c
kernel sof-dev: 165b34de
tplg: sof-byt-rt5651.tplg-2cd668c

Log:
dmesg-byt.log

@plbossart
Copy link
Member

@ranj063 does pm_runtime work on MinnowBoard? If not, maybe we should remove this capability for now to unlock such blocking issues?

@plbossart
Copy link
Member

plbossart commented Nov 14, 2018

Can we check if #237 fixes this issue?

Also can I get clarity on MinnowBoard support for pm_runtime, the issue above mentions debugFS failing to resume so things are not clear to me...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APL Applies to ApolloLake platform bug Something isn't working P1 Blocker bugs or important features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants