Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long running TEE command generate kernel complains (rpi3) #1353

Closed
msa2 opened this issue Feb 14, 2017 · 6 comments
Closed

Long running TEE command generate kernel complains (rpi3) #1353

msa2 opened this issue Feb 14, 2017 · 6 comments

Comments

@msa2
Copy link

msa2 commented Feb 14, 2017

I've a TA command that takes a long time (RSA key generation), and this causes some "unpleasant looking" kernel dumps (things still work). Any idea what to do about them?
(Note, I'm calling the TEE from kernel using the kernel API patch).

[  710.072698] INFO: rcu_sched self-detected stall on CPU
[  710.077936] 	0-...: (67997 ticks this GP) idle=b39/140000000000001/0 softirq=24517/24517 fqs=67935 
[  710.087115] 	 (t=68262 jiffies g=27406 c=27405 q=18155)
[  710.092421] Task dump for CPU 0:
[  710.095692] resourcemgr     R  running task        0   754    675 0x00000002
[  710.102850] Call trace:
[  710.105343] [<ffffff800808a3e0>] dump_backtrace+0x0/0x1c8
[  710.110825] [<ffffff800808a5cc>] show_stack+0x24/0x30
[  710.115952] [<ffffff80080cbf50>] sched_show_task+0xa0/0x100
[  710.121607] [<ffffff80080ce3a0>] dump_cpu_task+0x48/0x58
[  710.126999] [<ffffff80080f59b8>] rcu_dump_cpu_stacks+0xa0/0xe8
[  710.132918] [<ffffff80080f99b4>] rcu_check_callbacks+0x56c/0x840
[  710.139012] [<ffffff80080ff06c>] update_process_times+0x44/0x70
[  710.145020] [<ffffff800810fc1c>] tick_sched_handle.isra.5+0x54/0x70
[  710.151379] [<ffffff800810fc84>] tick_sched_timer+0x4c/0x98
[  710.157034] [<ffffff80080ff9cc>] __hrtimer_run_queues+0xcc/0x2a8
[  710.163128] [<ffffff8008100280>] hrtimer_interrupt+0xa0/0x1d0
[  710.168962] [<ffffff800856fa84>] arch_timer_handler_phys+0x3c/0x50
[  710.175234] [<ffffff80080eff5c>] handle_percpu_devid_irq+0x84/0x178
[  710.181593] [<ffffff80080eb00c>] generic_handle_irq+0x34/0x50
[  710.187424] [<ffffff80080eb390>] __handle_domain_irq+0x68/0xc0
[  710.193343] [<ffffff8008082588>] bcm2836_arm_irqchip_handle_irq+0x80/0xe0
[  710.200230] Exception stack(0xffffffc0341cf9c0 to 0xffffffc0341cfae0)
[  710.206766] f9c0: 00000000ffff0004 0000000000000000 0000000000000000 0000000000000000
[  710.214711] f9e0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  710.222656] fa00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  710.230600] fa20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  710.238545] fa40: 0000000000000000 0000000000000000 0000000000000010 ffffffc033b45e18
[  710.246490] fa60: ffffffc031a38500 ffffffc0341cfb58 00000000ffffffff ffffffc033b45e38
[  710.254435] fa80: 00000000ffff0000 ffffffc0343a8808 0000000000000001 ffffff80086b7000
[  710.262380] faa0: ffffffc0341cc000 ffffffc0341cfaf0 ffffff800859c7b4 ffffffc0341cfae0
[  710.270325] fac0: ffffff8008091264 0000000080000145 ffffffc033b45e18 ffffffc031a38500
[  710.278270] [<ffffff8008085724>] el1_irq+0xa4/0x100
[  710.283220] [<ffffff8008091264>] arm_smccc_smc+0x4/0x20
[  710.288524] [<ffffff800859cc34>] optee_invoke_func+0x12c/0x188
[  710.294443] [<ffffff80085997a0>] tee_client_invoke_func+0x48/0x60
...

... I'm running in DEBUG mode, so there are lots of DEBUG: messages, could be cause?
... or, as I'm calling TEE from another kernel driver, I suppose I could have some locking on, in which case it is my fault.

@jbech-linaro
Copy link
Contributor

I'd imagine this would be seen on any device (i.e., not just RPi3). I cannot say for sure, but in some cases Linux kernel becomes a bit unhappy when you are doing jobs that takes a certain amount of time. Normally we get interrupted when doing work in secure side which gives CPU time back to the kernel and then we re-enter secure side again and continue the job. Could it be that simple that kernel API patch contains a bug setting this a side? (I haven't really looked into that patch set).

@jenswi-linaro
Copy link
Contributor

At the boundary of syscall entry/exit there's some RCU magic done. For long running syscalls the RCU magic is delayed. Perhaps there's some special barrier function that can be called to make RCU happy without crossing a syscall boundary.

@jforissier
Copy link
Contributor

I also think that when we're switching to secure world, RCU has to be told that we're about to execute some possibly lengthy code that doesn't deal at all with RCU or use them in any way. Similar to what's done in the idle thread: rcu_idle_enter()/rcu_idle_exit() called by cpu_idle_poll(). See here: http://lists.openwall.net/linux-kernel/2016/05/16/207 for what led me to believe this ("Third fix...").

IMO this should be discussed on the ARM LKML, where we can find RCU experts, and BTW it would be a gentle remainder that some people are indeed using the OP-TEE driver that is also under review for upstream ;)

@milunt
Copy link

milunt commented May 10, 2017

Assuming IRQ is not masked while doing a lengthy secure standard call, besides the additional latency to switch from op-tee to NW context after which the normal linux IRQ handling continues, how is it affecting
the rcu/watchdog ? I am not familiar with how RCU stall is detected in detail I will check it.
Is there any thread on LKML discussing this ? Is there any update to this.

@milunt
Copy link

milunt commented May 19, 2017

I misread the issue hence my previous question "How op-tee presence is it affecting
the rcu/watchdog". I understood this has nothing to do with op-tee but any long run loop
on a non preemptive kernel (CONFIG_PREEMPT off) will trigger the rcu stall.
Some subsystems solve it by adding cond_resched_xx in the loop to
allow voluntary invoke of schedule . having something like rcu_idle_enter()/rcu_idle_exit() pair
also sounds good.

@ghost
Copy link

ghost commented Oct 6, 2017

We are closing this issue/question, why?

  • We believe the question has been answered.
  • Quite often when we have answered the questions, the issuer forget to close the issue which results in many open issues for us which makes it hard to keep track of the ones actually waiting for an answer.
  • In some cases we ask follow-up questions, but do not hear back from the one creating the issue.

If you however feel that you have additional questions or still thinks this is an issue, then please feel free to re-open the issue again. When closing issues we will only show this entire message once.

// OP-TEE admins

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants