Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System crash after some heavy usb serial usage #3720

Open
Tangerino opened this issue Jul 13, 2020 · 19 comments
Open

System crash after some heavy usb serial usage #3720

Tangerino opened this issue Jul 13, 2020 · 19 comments

Comments

@Tangerino
Copy link

Describe the bug
System crashes while using ttyUSB0

To reproduce
Brand new SD card with Ubuntu Focal 20.04 LTS

All packages up to date

This is my test program

import minimalmodbus

instrument = minimalmodbus.Instrument('/dev/ttyUSB0', 2)
instrument.debug = False
instrument.serial.baudrate = 9600
instrument.serial.timeout = 1
while True:
    instrument.read_long(229)

Expected behaviour
Keep running forever

Actual behaviour
The systems freeze. I was able to verify some behavior only using a terminal and keyboard

System

DTDI USB driver for /dev/ttyUSB0

Logs
First error was:

image

Second error, after few seconds

image

@pelwell
Copy link
Contributor

pelwell commented Jul 14, 2020

There was a firmware released yesterday - available using "sudo rpi-update" - which fixed a potential problem with USB interrupts. You can update the firmware and leave the kernel if you use "sudo SKIP_KERNEL=1 rpi-update". Alternatively, download the files start*.elf and fixup*.dat from here: https://github.com/raspberrypi/firmware/tree/master/boot

@juergh
Copy link
Contributor

juergh commented Jul 14, 2020

This is reported against the Ubuntu kernel so should be reported in launchpad. @Tangerino Can you run 'apport-bug linux' on the affected system?

@Tangerino
Copy link
Author

The system does not exist anymore. I'll rebuild the device and test it again.

@Tangerino
Copy link
Author

We are talking about Ubuntu here, right?

@pelwell
Copy link
Contributor

pelwell commented Jul 14, 2020

  1. I thought you would know that, of all people.
  2. What @juergh said.
  3. Ubuntu don't build VPU firmware, hence my suggestion and the SKIP_KERNEL=1 suggestion.

@juergh
Copy link
Contributor

juergh commented Jul 14, 2020

@Tangerino rpi-update is not available on Ubuntu and even if it were it wouldn't work. Ubuntu uses flash-kernel (and u-boot) which wants the firmware in /boot/firmware and not /boot. I've confirmed the problem (on Ubuntu) and there is already another bug that looks similar/identical: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/1880388
Let's take the Ubuntu discussion there. Once I've confirmed that the problem is reproducible with Raspbian we can discuss that here.

@pelwell
Copy link
Contributor

pelwell commented Jul 14, 2020

rpi-update is only a curl or wget away:

$ sudo curl -L --output /usr/bin/rpi-update https://raw.githubusercontent.com/Hexxeh/rpi-update/master/rpi-update && sudo chmod +x /usr/bin/rpi-update

As the documentation says:

ROOT_PATH and BOOT_PATH

sudo ROOT_PATH=/media/root BOOT_PATH=/media/boot rpi-update

Allows you to perform an "offline" update, ie update firmware on an SD card you are not currently booted from. Useful for installing firmware/kernel to a non-RPI customised image. Be careful, you must specify both options or neither. Specifying only one will not work.

@juergh
Copy link
Contributor

juergh commented Jul 14, 2020

@pelwell Ah, nice. Thanks for the info!

@Tangerino
Copy link
Author

Tangerino commented Jul 14, 2020

Thank you for the info!

I build a fresh new image.
The update is done, the test ran for a while and got his message and then freezes.
image

This happens in the console while I was installing other stuff, increasing the CPU usage. Interesting that I wish to NEVER increase speed to avoid the heat!

Not sure where to go now.

A piece of additional information is that Raspbian works fine in a number of os devices 24x7.

Cheers

@Tangerino
Copy link
Author

Second try
image

Now with no other terminal in parallel.

@juergh
Copy link
Contributor

juergh commented Jul 15, 2020

I can reproduce this on Ubuntu but not on the Raspberry Pi OS, both running 5.4 kernels and the same firmware version. @pelwell I think you can close this ticket as Raspberry Pi OS doesn't seem to be affected. However, it looks like it's a config option that we have (or not) set on Ubuntu. Do you have any idea which areas I should be focusing on trying to track this down?

@pelwell
Copy link
Contributor

pelwell commented Jul 15, 2020

In rpi-5.4.y we switched from bcm2835-cpufreq to the upstream raspberrypi-cpufreq, on all Pis except those using BCM2835, i.e.

CONFIG_CPUFREQ_DT=y
CONFIG_ARM_RASPBERRYPI_CPUFREQ=y
# CONFIG_ARM_BCM2835_CPUFREQ is not set

@Tangerino
Copy link
Author

Any action for me here?

@juergh
Copy link
Contributor

juergh commented Jul 19, 2020

It turns out that the preemption model makes a difference. Raspberry Pi OS uses 'Low-latency Desktop' whereas Ubuntu uses 'Desktop'. If I switch Ubuntu to 'Low-level Desktop' the problem goes away. If I switch the Raspberry Pi OS kernel to 'Desktop' it fails as well. I cannot reproduce this on arm 32-bit.

It looks like something's not right with the calls of local_fiq_{enable,disable} in dwc_otg_read_common_intr(). These calls are noops on arm 32-bit which would explain why the problem doesn't show there.

FWIW, the current Raspberry Pi OS 5.4.51-v8+ kernel spits out the following during boot after enabling DEBUG_ATOMIC_SLEEP. IMO that's a problem that needs fixing.

Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.726241] BUG: sleeping function called from invalid context at kernel/irq/manage.c:137
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.726539] dwc_otg: FIQ enabled
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.733318] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/2
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.733322] Preemption disabled at:
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.733340] [] secondary_start_kernel+0xc4/0x180
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.733357] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.4.51-v8+ #48
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.740371] dwc_otg: NAK holdoff enabled
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.747486] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.747491] Call trace:
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.747501] dump_backtrace+0x0/0x148
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.747509] show_stack+0x24/0x30
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.747523] dump_stack+0xd8/0x144
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.754599] dwc_otg: FIQ split-transaction FSM enabled
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.761730] ___might_sleep+0x158/0x188
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.761738] __might_sleep+0x58/0x90
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.761746] synchronize_irq+0x54/0xd0
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.761755] disable_irq+0x38/0x48
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.768983] Module dwc_common_port init
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.776156] local_fiq_disable+0x20/0x28
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.776166] dwc_otg_handle_common_intr+0x498/0xdc0
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.776175] dwc_otg_common_irq+0x20/0x38
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.776184] __handle_irq_event_percpu+0x78/0x2c0
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.878963] handle_irq_event_percpu+0x3c/0x98
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.885348] handle_irq_event+0x4c/0xd0
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.891709] handle_level_irq+0xe4/0x170
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.898036] generic_handle_irq+0x34/0x50
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.904345] bcm2836_chained_handle_irq+0x50/0x70
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.910642] generic_handle_irq+0x34/0x50
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.916993] __handle_domain_irq+0x98/0x108
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.923398] bcm2836_arm_irqchip_handle_irq+0x68/0xd0
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.929802] el1_irq+0xbc/0x180
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.936144] arch_cpu_idle+0x34/0x220
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.942471] default_idle_call+0x24/0x48
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.948775] do_idle+0x210/0x258
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.955015] cpu_startup_entry+0x28/0x48
Jul 17 11:34:16 rpi-3b-plus-rev1d3-abfb kernel: [ 2.961217] secondary_start_kernel+0x148/0x180

And with the 'Desktop' preemption model and PROVE_LOCKING enabled I also get:

[ 4.937147] ================================
[ 4.948024] WARNING: inconsistent lock state
[ 4.958927] 5.4.51-v8+ #46 Not tainted
[ 4.969265] --------------------------------
[ 4.980101] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[ 4.992707] kworker/1:1/31 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 5.004697] ffffff80347eb698 (&((spinlock_t *)hcd->lock)->rlock){?.-.}, at: DWC_SPINLOCK_IRQSAVE+0x24/0x38
[ 5.021160] {IN-HARDIRQ-W} state was registered at:
[ 5.032747] lock_acquire+0xd0/0x210
[ 5.043016] _raw_spin_lock+0x3c/0x50
[ 5.053313] DWC_SPINLOCK+0x20/0x30
[ 5.063365] dwc_otg_handle_common_intr+0x5c/0xdc0
[ 5.074726] dwc_otg_common_irq+0x20/0x38
[ 5.085257] __handle_irq_event_percpu+0xb8/0x368
[ 5.096493] handle_irq_event_percpu+0x3c/0x98
[ 5.107430] handle_irq_event+0x4c/0xd0
[ 5.117690] handle_level_irq+0xcc/0x148
[ 5.127971] generic_handle_irq+0x34/0x50
[ 5.138310] bcm2836_chained_handle_irq+0x50/0x70
[ 5.149383] generic_handle_irq+0x34/0x50
[ 5.159733] __handle_domain_irq+0x6c/0xc0
[ 5.170168] bcm2836_arm_irqchip_handle_irq+0x74/0xe0
[ 5.181599] el1_irq+0xbc/0x140
[ 5.191072] arch_cpu_idle+0x40/0x1c8
[ 5.201063] default_idle_call+0x24/0x48
[ 5.211280] do_idle+0x208/0x250
[ 5.220668] cpu_startup_entry+0x28/0x48
[ 5.230691] secondary_start_kernel+0x140/0x178
[ 5.241245] irq event stamp: 3526
[ 5.250414] hardirqs last enabled at (3525): [] slab_free_freelist_hook+0xb8/0x180
[ 5.265816] hardirqs last disabled at (3526): [] _raw_spin_lock_irqsave+0x30/0x78
[ 5.281021] softirqs last enabled at (3264): [] __do_softirq+0x400/0x488
[ 5.295475] softirqs last disabled at (3253): [] irq_exit+0xfc/0x130
[ 5.309396]
other info that might help us debug this:
[ 5.327579] Possible unsafe locking scenario:

[ 5.344862] CPU0
[ 5.352849] ----
[ 5.360759] lock(&((spinlock_t *)hcd->lock)->rlock);
[ 5.371445]
[ 5.379516] lock(&((spinlock_t *)hcd->lock)->rlock);
[ 5.390345]
*** DEADLOCK ***

[ 5.412152] 4 locks held by kworker/1:1/31:
[ 5.421568] #0: ffffff80351a4328 ((wq_completion)usb_hub_wq){+.+.}, at: process_one_work+0x1d8/0x5f8
[ 5.436265] #1: ffffffc011c53dd8 ((work_completion)(&hub->events)){+.+.}, at: process_one_work+0x1d8/0x5f8
[ 5.451503] #2: ffffff803401b200 (&dev->mutex){....}, at: hub_event+0x60/0x12d8
[ 5.464383] #3: ffffff80347eb698 (&((spinlock_t *)hcd->lock)->rlock){?.-.}, at: DWC_SPINLOCK_IRQSAVE+0x24/0x38
[ 5.480092]
stack backtrace:
[ 5.495155] CPU: 1 PID: 31 Comm: kworker/1:1 Not tainted 5.4.51-v8+ #46
[ 5.507317] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
[ 5.519171] Workqueue: usb_hub_wq hub_event
[ 5.528838] Call trace:
[ 5.536652] dump_backtrace+0x0/0x148
[ 5.545699] show_stack+0x24/0x30
[ 5.554391] dump_stack+0xe0/0x14c
[ 5.563139] print_usage_bug+0x1d8/0x1e0
[ 5.572405] mark_lock+0x580/0x740
[ 5.581129] mark_held_locks+0x54/0x88
[ 5.590184] lockdep_hardirqs_on+0x58/0x160
[ 5.599659] trace_hardirqs_on+0x58/0x138
[ 5.608900] _raw_spin_unlock_irq+0x38/0x50
[ 5.618221] finish_task_switch+0x90/0x238
[ 5.627328] __schedule+0x380/0x8b8
[ 5.635790] preempt_schedule_common+0x18/0x30
[ 5.645263] _cond_resched+0x30/0x40
[ 5.653845] synchronize_irq+0x44/0xc0
[ 5.662587] disable_irq+0x38/0x48
[ 5.670956] local_fiq_disable+0x20/0x28
[ 5.679856] fiq_fsm_queue_split_transaction+0x30c/0x7b8
[ 5.690186] dwc_otg_hcd_queue_transactions+0x378/0x468
[ 5.700483] dwc_otg_hcd_urb_enqueue+0xf4/0x228
[ 5.710094] dwc_otg_urb_enqueue+0x194/0x370
[ 5.719430] usb_hcd_submit_urb+0x238/0x940
[ 5.728655] usb_submit_urb+0x194/0x520
[ 5.737508] usb_start_wait_urb+0x6c/0x108
[ 5.746625] usb_control_msg+0xc0/0x138
[ 5.755429] usb_get_string+0x78/0xe0
[ 5.764033] usb_string_sub+0x90/0x158
[ 5.772706] usb_string+0xd0/0x190
[ 5.781009] usb_cache_string+0x58/0xd0
[ 5.789740] usb_new_device+0xac/0x4a8
[ 5.798387] hub_event+0x814/0x12d8
[ 5.806739] process_one_work+0x284/0x5f8
[ 5.815630] worker_thread+0x50/0x428
[ 5.824032] kthread+0x104/0x130
[ 5.831959] ret_from_fork+0x10/0x18

This is probably a fallout of the above atomic sleep problem and eventually kills the machine.

The above are all reported during boot before any real tests are exercised. To reproduce the hang: Connect a USB/serial adapter to the Pi, run 'minicom -D /dev/ttyUSB0' and send data from another device. With 'Desktop' preemption, the Pi hangs within seconds.

@juergh
Copy link
Contributor

juergh commented Jul 22, 2020

FWIW, I can get rid of the above warnings when using dwc_otg.fiq_enable=0 and dwc_otg.fiq_fsm_enable=0 and the machine stays up even with 'Desktop' preemption.

@pelwell
Copy link
Contributor

pelwell commented Jul 22, 2020

It's probably worth trying dtoverlay=dwc2 - this selects the upstream (non-FIQ-accelerated) USB driver.

@juergh
Copy link
Contributor

juergh commented Jul 24, 2020

dwc2 works as well. What's the difference between upstream dwc2 and Raspberry Pi's dwcotg? FIQ acceleration only?

@pelwell
Copy link
Contributor

pelwell commented Jul 24, 2020

dwc2 has support for gadget mode and benefits from upstream maintenance and bug fixing, dwc_otg has FIQ acceleration.

@Tangerino
Copy link
Author

After:

apt update
apt upgrade

and add to the file /boot/firmware/usercfg.txt

dtoverlay=dwc2

The tests are running for over 12 hours

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants