Fixed latency issue caused by disabling preemption for up to 5 seconds #72

Himmele · 2012-08-01T11:40:46Z

Setting up the timeout to 5000 by the RPi patch in sdhci_send_command (sdhci.c) is not a good idea.
This blocks everything on the Raspberry Pi for up to 200ms in real world scenarios.
I don't know if my supplied patch if perfectly correct, but it works for my low latency streaming app.

Change-Id: Iab555157c4255d64865a3c7ef2dffee986954d56

grigorig · 2012-08-01T12:24:22Z

Good catch, this is another instance of busy waiting in the SDHCI stack that sometimes eats up notable amounts of CPU time. :( I don't know too much about Linux kernel task scheduling, but doesn't it more sense to use msleep() instead of schedule() and mdelay()?

popcornmix · 2012-08-01T14:14:14Z

The preempt_enable does seem a hack. I assume it is only needed because we are holding the spinlock:
spin_lock_irqsave(&host->lock, flags);

I think the real solution is to release the specific spinlock (if that is determnined to be safe) before sleeping.

Himmele · 2012-08-01T14:50:25Z

I also did the spin_unlock_irqrestore(&host->lock, flags) and spin_lock_irqsave(&host->lock, flags) before. It also worked fine. But I don't know the the SDHCI driver good enough to be sure which solution is the best.

popcornmix · 2012-08-01T14:57:09Z

Hopefull grigorig can comment.
The spinlock is to protect against ISR and core parts of driver accessing data structures and sdhost peripheral registers at the same time.
My guess is that neither solution is safe without some careful thought.

There is some evidence that this part of the code is responsible for the widespread USB packet loss issues (interrupts being disabled for more than 1ms can result in USB packet loss), so a correct fix could be very valuable.

Himmele · 2012-08-01T14:59:49Z

Another question is why is this timeout adjustment to 5000 necessary at all? This is not included in the vanilla kernel but has been added by the Raspberry Pi developers. Can't this be solved different?

grigorig · 2012-08-01T15:36:51Z

While a read or write is in progress, the SD card is polled to check for progress and end of the transaction. According to the comment in the source code, that is buggy with the Arasan controller. So it sets a large timeout and waits for the end of the transfer, in case the status is polled. That's of course a really terrible solution.

grigorig · 2012-08-01T16:36:22Z

I disabled the missing_status quirk for testing, and this indeed seems to help with both USB packet loss and SD performance. I kept iozone running in the background, and typing was unaffected by this, no repeating or lost keys at all.

So far I haven't seen any issues with SD access without the quirk.

Himmele · 2012-08-01T17:09:19Z

I also disabled the whole
if(host->ops->missing_status && (cmd->opcode == MMC_SEND_STATUS)) {
timeout = 5000; // Really obscenely large delay to send the status, due to bug in controller
// which might cause the STATUS command to get stuck when a data operation is in flow
mask |= SDHCI_DATA_INHIBIT;
}
block once and had no problems at all.
Does anybody know if the Raspberry Pi hardware is really affected by this? Or is it only for certain SD cards?
Ok, I think then I will also use this solution of disabling the missing_status quirk.

popcornmix · 2012-08-01T17:40:51Z

I've asked.
"sync_after_dma" is believed to be needed, but myself, lb and others have found it works just fine without.
I think "missing_status" could be the same (I'm seeing no issues).
The workarounds were added more than two years ago, when the testing was on FPGA.
It's possibly the real chip did get a newer release of the Arasan core but at the time it wasn't testing with workarounds removed.

I might push out an update with the workarounds disabled by default (as there is clearly a benefit to most), and see if there are any problems. There will be comand line options to revert to previous behaviour.

popcornmix · 2012-08-01T19:20:56Z

Firmware has been pushed with grigorig's pull request, and new command line parameter.
Please test with:
sdhci-bcm2708.missing_status=0 sdhci-bcm2708.sync_after_dma=0
added to command line. I haven't made these a default yet, but I will soon if no one complains.

popcornmix · 2012-08-02T09:16:44Z

Unfortunately missing_status doesn't seem safe for everyone. I got this report:
Aug 2 04:34:19 raspi kernel: [ 78.096112] mmc0: Controller never released inhibit bit(s).
Aug 2 04:34:19 raspi kernel: [ 78.167406] mmcblk0: unknown error -5 sending read/write command, card status 0x900
Aug 2 04:34:19 raspi kernel: [ 78.167481] end_request: I/O error, dev mmcblk0, sector 212720
Aug 2 04:34:19 raspi kernel: [ 78.167512] Buffer I/O error on device mmcblk0p2, logical block 2270
Aug 2 04:34:19 raspi kernel: [ 78.167528] lost page write due to I/O error on mmcblk0p2
Aug 2 04:39:05 raspi kernel: [ 364.178196] mmc0: Controller never released inhibit bit(s).
Aug 2 04:39:05 raspi kernel: [ 364.221899] mmcblk0: unknown error -5 sending read/write command, card status 0x900
Aug 2 04:39:05 raspi kernel: [ 364.221976] end_request: I/O error, dev mmcblk0, sector 4394776
Aug 2 04:39:05 raspi kernel: [ 364.222007] Buffer I/O error on device mmcblk0p2, logical block 525027
Aug 2 04:39:05 raspi kernel: [ 364.222023] lost page write due to I/O error on mmcblk0p2

@grigorig any thoughts about not doing the timeout wait here, but periodically checking before the next op?

grigorig · 2012-08-02T10:55:16Z

I think the best solution would be to use the DATA_DONE interrupt, but that requires a lot of changes to the SDHCI code.

Himmele · 2012-08-02T11:42:56Z

But maybe the DATA_DONE interrupt is the only correct solution for this controller :-).

popcornmix · 2012-08-07T20:39:34Z

This was a comment on the hardware guy who integrated Arasan IP:
Arasan have chosen to implement their IP with several internal clocks, so they have to do some internal synchronisation.
This results in status registers not being immediately updated and therefore they are not reliable to check if an event has occurred.
It is preferable to use only the interrupt registers when polling whether the Arasan SDhost has finished doing something and then check the status registers to gather the details.
Interrupt registers use a proper handshake but status registers don’t.

sulge · 2012-08-11T19:40:58Z

What do you think about removing this code to fix the problem with USB and then move root and swap partitions to the USB pen drive? So only initialy boot will be made from SD. Is this solution will prevent problems with missing_status?

popcornmix · 2012-08-12T09:55:50Z

@sulge
Yes. If you are not using sdcard, then you won't have latency problems caused by sdcard.
Booting with rootfs and swap on USB will avoid the problem.

sulge · 2012-08-12T10:22:01Z

Great news!

Is it confirmed that with this options losing USB packets go away?

Thank you for answer :)

popcornmix · 2012-08-12T10:24:10Z

@sulge
You'll have to test. Some users have found their problems fixed by this.

sulge · 2012-08-12T13:17:59Z

Unfortunately, I have the following erros when testing USB:
BUG: scheduling while atomic: testUSB.sh/1155/0x00000002
Modules linked in: ipv6 pl2303 ftdi_sio usbserial
Backtrace:
Function entered at [] from []
r6:cd860000 r5:ceb75440 r4:00000000
Function entered at [] from []
Function entered at [] from []
r4:c0427be0
Function entered at [] from []
Function entered at [] from []
Function entered at [] from []
Function entered at [] from []
Function entered at [] from []
Function entered at [] from []
r8:000000a1 r7:00000000 r6:00000000 r5:00000007 r4:cea707c0
Function entered at [] from []
Function entered at [] from []
Function entered at [] from []
Function entered at [] from []
r8:ceadf840 r7:cd9cdc00 r6:ceaaa634 r5:ceaaa604 r4:cd860000
Function entered at [] from []
r8:ceadf840 r7:0bc00000 r6:ceadf840 r5:ceaaa600 r4:cd9cdc00
Function entered at [] from []
r6:cd9cdc00 r5:ce9d18e0 r4:ce9d18e4
Function entered at [] from []
Function entered at [] from []
Function entered at [] from []
Function entered at [] from []
r7:00000000 r6:00000000 r5:cd861ed0 r4:ce958798
Function entered at [] from []
Function entered at [] from []
Function entered at [] from []
r8:00000003 r7:ceb16000 r6:000001b6 r5:00020241 r4:00000001
Function entered at [] from []
Function entered at [] from []

popcornmix · 2012-08-12T13:58:47Z

@sulge
What kernel are you running with? Is it cutdown kernel, or a very old one?
The default kernel gives a proper backtrace so we might get some useful information.
Can you switch to latest (non-cutdown) kernel and get the backtrace again?

sulge · 2012-08-12T16:08:35Z

@popcornmix
I use kernel compiled from current git (I compile it by myself). I will try use non-cutdown kernel and reproduce the problem (problem happen from time to time and when happens breaks all USB devices including LAN).

sulge · 2012-08-12T18:27:33Z

@popcornmix

Please also note that the for my own compiled kernel I applied patch https://dl.dropbox.com/u/3669512/temp/0001-added-microframe-schedule-from-the-linux-denx-tree.dc4.patch. So it seems that this patch does not works correctly.
How can I help you to improve it?

Errors with kernel from https://github.com/raspberrypi/firmware/:

Jan 1 01:14:26 raspberry-pi kernel: [ 866.365772] INFO:: periodic_channel_available: Total channels: 8, Periodic: 4, Non-periodic: 4
Jan 1 01:14:26 raspberry-pi kernel: [ 866.365788]
Jan 1 01:14:26 raspberry-pi kernel: [ 866.385320] INFO:: schedule_periodic: No host channel available for periodic transfer.
Jan 1 01:14:26 raspberry-pi kernel: [ 866.385338]
Jan 1 01:14:26 raspberry-pi kernel: [ 866.416409] ERROR::dwc_otg_hcd_urb_enqueue:518: DWC OTG HCD URB Enqueue failed adding QTD. Error status -4008
Jan 1 01:14:26 raspberry-pi kernel: [ 866.416427]
Jan 1 01:14:26 raspberry-pi kernel: [ 866.785237] INFO:: periodic_channel_available: Total channels: 8, Periodic: 4, Non-periodic: 4
Jan 1 01:14:26 raspberry-pi kernel: [ 866.785253]
Jan 1 01:14:26 raspberry-pi kernel: [ 866.804780] INFO:: schedule_periodic: No host channel available for periodic transfer.
Jan 1 01:14:26 raspberry-pi kernel: [ 866.804800]
Jan 1 01:14:26 raspberry-pi kernel: [ 866.836987] ERROR::dwc_otg_hcd_urb_enqueue:518: DWC OTG HCD URB Enqueue failed adding QTD. Error status -4008
Jan 1 01:14:26 raspberry-pi kernel: [ 866.837005]
Jan 1 01:14:26 raspberry-pi kernel: [ 867.225173] INFO:: periodic_channel_available: Total channels: 8, Periodic: 4, Non-periodic: 4
Jan 1 01:14:26 raspberry-pi kernel: [ 867.225188]
Jan 1 01:14:26 raspberry-pi kernel: [ 867.244667] INFO:: schedule_periodic: No host channel available for periodic transfer.

popcornmix · 2012-08-12T19:10:57Z

@sulge
I don't think your issues are related to this one.
Can you catch the scheduling whilst atomic panic with the latest default (non-cutdown) kernel so we get a valid backtrace and create a new issue.

sulge · 2012-08-12T19:27:02Z

@popcornmix
I try many times but only get " periodic_channel_available: Total channels: 8, Periodic: 4, Non-periodic: 4" :( How can I get the default (not-cutdown) kernel with mentioned patch (to eliminate the suspicion)?

Ok I know :) I should use bcmrpi_defconfig ;)

popcornmix · 2012-08-12T20:38:07Z

@sulge
Yes. Start with clean linux tree. Apply patch. Use bcmrpi_defconfg. Report backtrace of panic.

sulge · 2012-08-12T21:09:54Z

@popcornmix
Something like that:
[ 139.859072] BUG: scheduling while atomic: testUSB1.sh/800/0x00000002
[ 139.870116] Modules linked in: ipv6 spidev ftdi_sio pl2303 usbserial spi_bcm2708 i2c_bcm2708
[ 139.888419] from
[ 139.905623] from
[ 139.922730] from
[ 139.939990] from
[ 139.956664] from
[ 139.973877] from
[ 139.991898] from
[ 140.010753] from
[ 140.030236] from
[ 140.048165] from [](ftdi_open+0x80/0xe8 [ftdi_sio])
[ 140.066307] [](ftdi_open+0x80/0xe8 [ftdi_sio]) from [](serial_activate+0x58/0x6c [usbserial])
[ 140.085492] [](serial_activate+0x58/0x6c [usbserial]) from
[ 140.104236] from [](serial_open+0x40/0x6c [usbserial])
[ 140.122569] [](serial_open+0x40/0x6c [usbserial]) from
[ 140.140441] from
[ 140.157426] from
[ 140.174855] from
[ 140.192618] from
[ 140.209857] from
[ 140.226631] from
[ 140.243675] from
[ 140.260844] from
[ 140.277684] from
[ 140.329483] BUG: scheduling while atomic: testUSB1.sh/800/0x00000002
[ 140.340380] Modules linked in: ipv6 spidev ftdi_sio pl2303 usbserial spi_bcm2708 i2c_bcm2708
[ 140.358095] from
[ 140.375210] from
[ 140.392267] from
[ 140.409470] from
[ 140.426147] from
[ 140.443397] from
[ 140.461425] from
[ 140.480225] from
[ 140.499139] from
[ 140.517029] from [](ftdi_set_termios+0x370/0x528 [ftdi_sio])
[ 140.535909] [](ftdi_set_termios+0x370/0x528 [ftdi_sio]) from [](ftdi_open+0x98/0xe8 [ftdi_sio])
[ 140.555678] [](ftdi_open+0x98/0xe8 [ftdi_sio]) from [](serial_activate+0x58/0x6c [usbserial])
[ 140.574875] [](serial_activate+0x58/0x6c [usbserial]) from
[ 140.593835] from [](serial_open+0x40/0x6c [usbserial])
[ 140.612187] [](serial_open+0x40/0x6c [usbserial]) from
[ 140.630088] from
[ 140.647063] from
[ 140.664512] from
[ 140.682335] from
[ 140.699602] from
[ 140.716422] from
[ 140.733557] from
[ 140.750775] from
[ 140.767666] from
[ 140.846628] BUG: scheduling while atomic: testUSB1.sh/800/0x00000002
[ 140.857229] Modules linked in: ipv6 spidev ftdi_sio pl2303 usbserial spi_bcm2708 i2c_bcm2708
[ 140.874157] from
[ 140.890945] from
[ 140.907636] from
[ 140.924480] from
[ 140.940798] from
[ 140.973150] note: testUSB1.sh[800] exited with preempt_count 1
[ 140.985802] BUG: scheduling while atomic: testUSB1.sh/800/0x40000002
[ 140.996890] Modules linked in: ipv6 spidev ftdi_sio pl2303 usbserial spi_bcm2708 i2c_bcm2708
[ 141.016007] from
[ 141.035130] from
[ 141.052360] from
[ 141.069713] from
[ 141.087153] from
[ 141.104876] from
[ 141.122206] from
[ 141.139446] from
[ 141.156064] from
[ 141.172612] from
[ 141.190648] from
[ 141.207730] from
[ 141.226224] from
[ 141.245054] from
[ 141.292721] BUG: scheduling while atomic: testUSB1.sh/800/0x00000002
[ 141.303887] Modules linked in: ipv6 spidev ftdi_sio pl2303 usbserial spi_bcm2708 i2c_bcm2708
[ 141.322705] from
[ 141.340125] from
[ 141.357588] from
[ 141.375268] from
[ 141.392236] from
[ 141.409918] from
[ 141.428278] from
[ 141.447457] from
[ 141.466674] from
[ 141.484869] from [](update_mctrl+0xdc/0x224 [ftdi_sio])
[ 141.503589] [](update_mctrl+0xdc/0x224 [ftdi_sio]) from [](ftdi_dtr_rts+0xc4/0xe4 [ftdi_sio])
[ 141.523215] [](ftdi_dtr_rts+0xc4/0xe4 [ftdi_sio]) from [](serial_dtr_rts+0x30/0x34 [usbserial])
[ 141.542955] [](serial_dtr_rts+0x30/0x34 [usbserial]) from
[ 141.562491] from
[ 141.581777] from
[ 141.600372] from [](serial_close+0x40/0x6c [usbserial])
[ 141.619406] [](serial_close+0x40/0x6c [usbserial]) from
[ 141.638073] from
[ 141.654892] from
[ 141.671591] from
[ 141.689515] from
[ 141.707269] from
[ 141.724145] from
[ 141.741334] from
[ 141.759898] from
[ 141.778706] from

sulge · 2012-08-12T21:15:28Z

How correct add this report to this thread? Because when I edit all info is visible but after saving only [ 141.741334] from.

popcornmix · 2012-08-12T21:36:40Z

Use pastebin and just post a link here?

sulge · 2012-08-12T21:46:37Z

@popcornmix
http://mm.pl/~kozek/backtrace

sulge · 2012-08-24T20:20:15Z

Hello,

Whether it makes sense to test this patch only if only my boot partition is located on the SD card?

ddv2005 · 2012-08-24T21:36:08Z

Yes. If your root partition not on the SD card then it doesn't make sense.

popcornmix · 2012-08-25T11:53:00Z

Thanks @ddv2005.
I've applied the patch, and nothing's immediately broken.
Any chance you could add a module paramater to enable/disable the new behaviour. That would make it easier to push out for wider testing.

ddv2005 · 2012-08-25T12:40:20Z

I'll add a parameter on Monday

ddv2005 · 2012-08-27T18:52:09Z

I have updated my patch. To enable low-latency mode just set sdhci-bcm2708.enable_llm=1 in cmdline.txt

popcornmix · 2012-08-27T20:09:50Z

Thanks. I'd not seen any problem with the earlier one, so I'll try pushing this out and suggesting people test it.

popcornmix · 2012-08-28T17:32:42Z

sdhci-bcm2708.enable_llm=1 is now in latest rpi-update firmware, Please test.

popcornmix · 2012-08-31T15:38:12Z

sdhci-bcm2708.enable_llm is now enabled by default.
Use sdhci-bcm2708.enable_llm=0 if you don't want the low latency mode.

popcornmix · 2012-09-01T10:36:04Z

Closing the pull request as an alternative solution is now checked in.
Thanks to @Himmele for highlighting the problem and @ddv2005 for improving it.

commit 9e5c6e5 upstream. pci_get_slot() is called with hold of PCI bus semaphore and it's not safe to be called in interrupt context. However, we possibly checks EEH error and calls the function in interrupt context. To avoid using pci_get_slot(), we turn into device tree for fetching location code. Otherwise, we might run into WARN_ON() as following messages indicate: WARNING: at drivers/pci/search.c:223 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72 task: c000000001367af0 ti: c000000001444000 task.ti: c000000001444000 NIP: c000000000497b70 LR: c000000000037530 CTR: 000000003003d114 REGS: c000000001446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+) MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 48002422 XER: 20000000 CFAR: c00000000003752c SOFTE: 0 : NIP [c000000000497b70] .pci_get_slot+0x40/0x110 LR [c000000000037530] .eeh_pe_loc_get+0x150/0x190 Call Trace: .of_get_property+0x30/0x60 (unreliable) .eeh_pe_loc_get+0x150/0x190 .eeh_dev_check_failure+0x1b4/0x550 .eeh_check_failure+0x90/0xf0 .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc] .lpfc_poll_eratt+0x64/0x100 [lpfc] .call_timer_fn+0x64/0x190 .run_timer_softirq+0x2cc/0x3e0 Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pci_get_slot() is called with hold of PCI bus semaphore and it's not safe to be called in interrupt context. However, we possibly checks EEH error and calls the function in interrupt context. To avoid using pci_get_slot(), we turn into device tree for fetching location code. Otherwise, we might run into WARN_ON() as following messages indicate: WARNING: at drivers/pci/search.c:223 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72 task: c000000001367af0 ti: c000000001444000 task.ti: c000000001444000 NIP: c000000000497b70 LR: c000000000037530 CTR: 000000003003d114 REGS: c000000001446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+) MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 48002422 XER: 20000000 CFAR: c00000000003752c SOFTE: 0 : NIP [c000000000497b70] .pci_get_slot+0x40/0x110 LR [c000000000037530] .eeh_pe_loc_get+0x150/0x190 Call Trace: .of_get_property+0x30/0x60 (unreliable) .eeh_pe_loc_get+0x150/0x190 .eeh_dev_check_failure+0x1b4/0x550 .eeh_check_failure+0x90/0xf0 .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc] .lpfc_poll_eratt+0x64/0x100 [lpfc] .call_timer_fn+0x64/0x190 .run_timer_softirq+0x2cc/0x3e0 Cc: stable@vger.kernel.org Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

objdump's raw insn output can vary across architectures on the number of bytes per chunk (bpc) displayed and their endianness. The code-reading test relied on reading objdump output as 1 bpc. Kaixu Xia reported test failure on ARM64, where objdump displays 4 bpc: 70c48: f90027bf str xzr, [x29,#72] 70c4c: 91224000 add x0, x0, #0x890 70c50: f90023a0 str x0, [x29,#64] This patch adds support to read raw insn output for any bpc length. In case of 2+ bpc it also guesses objdump's display endian. Reported-and-Tested-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: Jan Stancek <jstancek@redhat.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/07f0f7bcbda78deb423298708ef9b6a54d6b92bd.1452592712.git.jstancek@redhat.com [ Fix up pr_fmt() call to use %zd for size_t variables, fixing the build on Ubuntu cross-compiling to armhf and ppc64 ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

On lubbock board, the probe of the driver crashes by dereferencing very early a platform_data structure which is not set, in pxa2xx_configure_sockets(). The stack fixed is : [ 0.244353] SA1111 Microprocessor Companion Chip: silicon revision 1, metal revision 1 [ 0.256321] sa1111 sa1111: Providing IRQ336-390 [ 0.340899] clocksource: Switched to clocksource oscr0 [ 0.472263] Unable to handle kernel NULL pointer dereference at virtual address 00000004 [ 0.480469] pgd = c0004000 [ 0.483432] [00000004] *pgd=00000000 [ 0.487105] Internal error: Oops: f5 [#1] ARM [ 0.491497] Modules linked in: [ 0.494650] CPU: 0 PID: 1 Comm: swapper Not tainted 4.8.0-rc3-00080-g1aaa68426f0c-dirty #2068 [ 0.503229] Hardware name: Intel DBPXA250 Development Platform (aka Lubbock) [ 0.510344] task: c3e42000 task.stack: c3e44000 [ 0.514984] PC is at pxa2xx_configure_sockets+0x4/0x24 (drivers/pcmcia/pxa2xx_base.c:227) [ 0.520193] LR is at pcmcia_lubbock_init+0x1c/0x38 [ 0.525079] pc : [<c0247c30>] lr : [<c02479b0>] psr: a0000053 [ 0.525079] sp : c3e45e70 ip : 100019ff fp : 00000000 [ 0.536651] r10: c0828900 r9 : c0434838 r8 : 00000000 [ 0.541953] r7 : c0820700 r6 : c0857b30 r5 : c3ec1400 r4 : c0820758 [ 0.548549] r3 : 00000000 r2 : 0000000c r1 : c3c09c40 r0 : c3ec1400 [ 0.555154] Flags: NzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none [ 0.562450] Control: 0000397f Table: a0004000 DAC: 00000053 [ 0.568257] Process swapper (pid: 1, stack limit = 0xc3e44190) [ 0.574154] Stack: (0xc3e45e70 to 0xc3e46000) [ 0.578610] 5e60: c4849800 00000000 c3ec1400 c024769c [ 0.586928] 5e80: 00000000 c3ec140c c3c0ee0c c3ec1400 c3ec1434 c020c410 c3ec1400 c3ec1434 [ 0.595244] 5ea0: c0820700 c080b408 c0828900 c020c5f8 00000000 c0820700 c020c578 c020ac5c [ 0.603560] 5ec0: c3e687cc c3e71e10 c0820700 00000000 c3c02de0 c020bae4 c03c62f7 c03c62f7 [ 0.611872] 5ee0: c3e68780 c0820700 c042e034 00000000 c043c440 c020cdec c080b408 00000005 [ 0.620188] 5f00: c042e034 c00096c0 c0034440 c01c730c 20000053 ffffffff 00000000 00000000 [ 0.628502] 5f20: 00000000 c3ffcb87 c3ffcb90 c00346ac c3e66ba0 c03f7914 00000092 00000005 [ 0.636811] 5f40: 00000005 c03f847c 00000091 c03f847c 00000000 00000005 c0434828 00000005 [ 0.645125] 5f60: c043482c 00000092 c043c440 c0828900 c0434838 c0418d2c 00000005 00000005 [ 0.653430] 5f80: 00000000 c041858c 00000000 c032e9f0 00000000 00000000 00000000 00000000 [ 0.661729] 5fa0: 00000000 c032e9f8 00000000 c000f0f0 00000000 00000000 00000000 00000000 [ 0.670020] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 0.678311] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [ 0.686673] (pxa2xx_configure_sockets) from pcmcia_lubbock_init (/drivers/pcmcia/sa1111_lubbock.c:161) [ 0.696026] (pcmcia_lubbock_init) from pcmcia_probe (/drivers/pcmcia/sa1111_generic.c:213) [ 0.704358] (pcmcia_probe) from driver_probe_device (/drivers/base/dd.c:378 /drivers/base/dd.c:499) [ 0.712848] (driver_probe_device) from __driver_attach (/./include/linux/device.h:983 /drivers/base/dd.c:733) [ 0.721414] (__driver_attach) from bus_for_each_dev (/drivers/base/bus.c:313) [ 0.729723] (bus_for_each_dev) from bus_add_driver (/drivers/base/bus.c:708) [ 0.738036] (bus_add_driver) from driver_register (/drivers/base/driver.c:169) [ 0.746185] (driver_register) from do_one_initcall (/init/main.c:778) [ 0.754561] (do_one_initcall) from kernel_init_freeable (/init/main.c:843 /init/main.c:851 /init/main.c:869 /init/main.c:1016) [ 0.763409] (kernel_init_freeable) from kernel_init (/init/main.c:944) [ 0.771660] (kernel_init) from ret_from_fork (/arch/arm/kernel/entry-common.S:119) [ 0.779347] Code: c03c6305 c03c631e c03c632e e5903048 (e993000c) All code ======== 0: c03c6305 eorsgt r6, ip, r5, lsl #6 4: c03c631e eorsgt r6, ip, lr, lsl r3 8: c03c632e eorsgt r6, ip, lr, lsr #6 c: e5903048 ldr r3, [r0, #72] ; 0x48 10:* e993000c ldmib r3, {r2, r3} <-- trapping instruction Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

kvmppc_gpa_to_ua() accesses KVM memory slot array via srcu_dereference_check() and this produces warnings from RCU like below. This extends the existing srcu_read_lock/unlock to cover that kvmppc_gpa_to_ua() as well. We did not hit this before as this lock is not needed for the realmode handlers and hash guests would use the realmode path all the time; however the radix guests are always redirected to the virtual mode handlers and hence the warning. [ 68.253798] ./include/linux/kvm_host.h:575 suspicious rcu_dereference_check() usage! [ 68.253799] other info that might help us debug this: [ 68.253802] rcu_scheduler_active = 2, debug_locks = 1 [ 68.253804] 1 lock held by qemu-system-ppc/6413: [ 68.253806] #0: (&vcpu->mutex){+.+.}, at: [<c00800000e3c22f4>] vcpu_load+0x3c/0xc0 [kvm] [ 68.253826] stack backtrace: [ 68.253830] CPU: 92 PID: 6413 Comm: qemu-system-ppc Tainted: G W 4.14.0-rc3-00553-g432dcba58e9c-dirty #72 [ 68.253833] Call Trace: [ 68.253839] [c000000fd3d9f790] [c000000000b7fcc8] dump_stack+0xe8/0x160 (unreliable) [ 68.253845] [c000000fd3d9f7d0] [c0000000001924c0] lockdep_rcu_suspicious+0x110/0x180 [ 68.253851] [c000000fd3d9f850] [c0000000000e825c] kvmppc_gpa_to_ua+0x26c/0x2b0 [ 68.253858] [c000000fd3d9f8b0] [c00800000e3e1984] kvmppc_h_put_tce+0x12c/0x2a0 [kvm] Fixes: 121f80b ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

commit 8f6a9f0 upstream. kvmppc_gpa_to_ua() accesses KVM memory slot array via srcu_dereference_check() and this produces warnings from RCU like below. This extends the existing srcu_read_lock/unlock to cover that kvmppc_gpa_to_ua() as well. We did not hit this before as this lock is not needed for the realmode handlers and hash guests would use the realmode path all the time; however the radix guests are always redirected to the virtual mode handlers and hence the warning. [ 68.253798] ./include/linux/kvm_host.h:575 suspicious rcu_dereference_check() usage! [ 68.253799] other info that might help us debug this: [ 68.253802] rcu_scheduler_active = 2, debug_locks = 1 [ 68.253804] 1 lock held by qemu-system-ppc/6413: [ 68.253806] #0: (&vcpu->mutex){+.+.}, at: [<c00800000e3c22f4>] vcpu_load+0x3c/0xc0 [kvm] [ 68.253826] stack backtrace: [ 68.253830] CPU: 92 PID: 6413 Comm: qemu-system-ppc Tainted: G W 4.14.0-rc3-00553-g432dcba58e9c-dirty #72 [ 68.253833] Call Trace: [ 68.253839] [c000000fd3d9f790] [c000000000b7fcc8] dump_stack+0xe8/0x160 (unreliable) [ 68.253845] [c000000fd3d9f7d0] [c0000000001924c0] lockdep_rcu_suspicious+0x110/0x180 [ 68.253851] [c000000fd3d9f850] [c0000000000e825c] kvmppc_gpa_to_ua+0x26c/0x2b0 [ 68.253858] [c000000fd3d9f8b0] [c00800000e3e1984] kvmppc_h_put_tce+0x12c/0x2a0 [kvm] Fixes: 121f80b ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 506a66f upstream Dave Hansen reported, that it's outright dangerous to keep SMT siblings disabled completely so they are stuck in the BIOS and wait for SIPI. The reason is that Machine Check Exceptions are broadcasted to siblings and the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or reboots the machine. The MCE chapter in the SDM contains the following blurb: Because the logical processors within a physical package are tightly coupled with respect to shared hardware resources, both logical processors are notified of machine check errors that occur within a given physical processor. If machine-check exceptions are enabled when a fatal error is reported, all the logical processors within a physical package are dispatched to the machine-check exception handler. If machine-check exceptions are disabled, the logical processors enter the shutdown state and assert the IERR# signal. When enabling machine-check exceptions, the MCE flag in control register CR4 should be set for each logical processor. Reverting the commit which ignores siblings at enumeration time solves only half of the problem. The core cpuhotplug logic needs to be adjusted as well. This thoughtful engineered mechanism also turns the boot process on all Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU before the secondary CPUs are brought up. Depending on the number of physical cores the window in which this situation can happen is smaller or larger. On a HSW-EX it's about 750ms: MCE is enabled on the boot CPU: [ 0.244017] mce: CPU supports 22 MCE banks The corresponding sibling #72 boots: [ 1.008005] .... node #0, CPUs: #72 That means if an MCE hits on physical core 0 (logical CPUs 0 and 72) between these two points the machine is going to shutdown. At least it's a known safe state. It's obvious that the early boot can be hit by an MCE as well and then runs into the same situation because MCEs are not yet enabled on the boot CPU. But after enabling them on the boot CPU, it does not make any sense to prevent the kernel from recovering. Adjust the nosmt kernel parameter documentation as well. Reverts: 2207def ("x86/apic: Ignore secondary threads if nosmt=force") Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Hammering the "bank enable" (PBKEN) bit on and off between every command crashes the Nomadik NHK15 with this message: Scanning device for bad blocks Unhandled fault: external abort on non-linefetch (0x008) at 0xcc95e000 pgd = (ptrval) [cc95e000] *pgd=0b808811, *pte=40000653, *ppte=40000552 Internal error: : 8 [#1] PREEMPT ARM Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc2+ #72 Hardware name: Nomadik STn8815 PC is at fsmc_exec_op+0x194/0x204 (...) After a discussion we (me and Boris Brezillon) start to suspect that this bit does not immediately control the chip select line at all, it rather enables access to the bank and the hardware will drive the CS autonomously. If there is a NAND chip connected, we should keep this enabled. As fsmc_nand_setup() sets this bit, we can simply remove the offending code. Fixes: 550b9fc ("mtd: rawnand: fsmc: Stop implementing ->select_chip()") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>

commit baef1c9 upstream. Using the batch API from the interconnect driver sometimes leads to a KASAN error due to an access to freed memory. This is easier to trigger with threadirqs on the kernel commandline. BUG: KASAN: use-after-free in rpmh_tx_done+0x114/0x12c Read of size 1 at addr fffffff51414ad84 by task irq/110-apps_rs/57 CPU: 0 PID: 57 Comm: irq/110-apps_rs Tainted: G W 4.19.10 #72 Call trace: dump_backtrace+0x0/0x2f8 show_stack+0x20/0x2c __dump_stack+0x20/0x28 dump_stack+0xcc/0x10c print_address_description+0x74/0x240 kasan_report+0x250/0x26c __asan_report_load1_noabort+0x20/0x2c rpmh_tx_done+0x114/0x12c tcs_tx_done+0x450/0x768 irq_forced_thread_fn+0x58/0x9c irq_thread+0x120/0x1dc kthread+0x248/0x260 ret_from_fork+0x10/0x18 Allocated by task 385: kasan_kmalloc+0xac/0x148 __kmalloc+0x170/0x1e4 rpmh_write_batch+0x174/0x540 qcom_icc_set+0x8dc/0x9ac icc_set+0x288/0x2e8 a6xx_gmu_stop+0x320/0x3c0 a6xx_pm_suspend+0x108/0x124 adreno_suspend+0x50/0x60 pm_generic_runtime_suspend+0x60/0x78 __rpm_callback+0x214/0x32c rpm_callback+0x54/0x184 rpm_suspend+0x3f8/0xa90 pm_runtime_work+0xb4/0x178 process_one_work+0x544/0xbc0 worker_thread+0x514/0x7d0 kthread+0x248/0x260 ret_from_fork+0x10/0x18 Freed by task 385: __kasan_slab_free+0x12c/0x1e0 kasan_slab_free+0x10/0x1c kfree+0x134/0x588 rpmh_write_batch+0x49c/0x540 qcom_icc_set+0x8dc/0x9ac icc_set+0x288/0x2e8 a6xx_gmu_stop+0x320/0x3c0 a6xx_pm_suspend+0x108/0x124 adreno_suspend+0x50/0x60 cr50_spi spi5.0: SPI transfer timed out pm_generic_runtime_suspend+0x60/0x78 __rpm_callback+0x214/0x32c rpm_callback+0x54/0x184 rpm_suspend+0x3f8/0xa90 pm_runtime_work+0xb4/0x178 process_one_work+0x544/0xbc0 worker_thread+0x514/0x7d0 kthread+0x248/0x260 ret_from_fork+0x10/0x18 The buggy address belongs to the object at fffffff51414ac80 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 260 bytes inside of 512-byte region [fffffff51414ac80, fffffff51414ae80) The buggy address belongs to the page: page:ffffffbfd4505200 count:1 mapcount:0 mapping:fffffff51e00c680 index:0x0 compound_mapcount: 0 flags: 0x4000000000008100(slab|head) raw: 4000000000008100 ffffffbfd4529008 ffffffbfd44f9208 fffffff51e00c680 raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: fffffff51414ac80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fffffff51414ad00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >fffffff51414ad80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ fffffff51414ae00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fffffff51414ae80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc The batch API sets the same completion for each rpmh message that's sent and then loops through all the messages and waits for that single completion declared on the stack to be completed before returning from the function and freeing the message structures. Unfortunately, some messages may still be in process and 'stuck' in the TCS. At some later point, the tcs_tx_done() interrupt will run and try to process messages that have already been freed at the end of rpmh_write_batch(). This will in turn access the 'needs_free' member of the rpmh_request structure and cause KASAN to complain. Furthermore, if there's a message that's completed in rpmh_tx_done() and freed immediately after the complete() call is made we'll be racing with potentially freed memory when accessing the 'needs_free' member: CPU0 CPU1 ---- ---- rpmh_tx_done() complete(&compl) wait_for_completion(&compl) kfree(rpm_msg) if (rpm_msg->needs_free) <KASAN warning splat> Let's fix this by allocating a chunk of completions for each message and waiting for all of them to be completed before returning from the batch API. Alternatively, we could wait for the last message in the batch, but that may be a more complicated change because it looks like tcs_tx_done() just iterates through the indices of the queue and completes each message instead of tracking the last inserted message and completing that first. Fixes: c8790cb ("drivers: qcom: rpmh: add support for batch RPMH request") Cc: Lina Iyer <ilina@codeaurora.org> Cc: "Raju P.L.S.S.S.N" <rplsssn@codeaurora.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Evan Green <evgreen@chromium.org> Cc: stable@vger.kernel.org Reviewed-by: Lina Iyer <ilina@codeaurora.org> Reviewed-by: Evan Green <evgreen@chromium.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Andy Gross <andy.gross@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 23377c2 ] When the device is disconnected while passing traffic it is possible to receive out of order urbs causing a memory leak since the skb linked to the current tx urb is not removed. Fix the issue deallocating the skb cleaning up the tx ring. Moreover this patch fixes the following kernel warning [ 57.480771] usb 1-1: USB disconnect, device number 2 [ 57.483451] ------------[ cut here ]------------ [ 57.483462] TX urb mismatch [ 57.483481] WARNING: CPU: 1 PID: 32 at drivers/net/wireless/mediatek/mt7601u/dma.c:245 mt7601u_complete_tx+0x165/00 [ 57.483483] Modules linked in: [ 57.483496] CPU: 1 PID: 32 Comm: kworker/1:1 Not tainted 5.2.0-rc1+ raspberrypi#72 [ 57.483498] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014 [ 57.483502] Workqueue: usb_hub_wq hub_event [ 57.483507] RIP: 0010:mt7601u_complete_tx+0x165/0x1e0 [ 57.483510] Code: 8b b5 10 04 00 00 8b 8d 14 04 00 00 eb 8b 80 3d b1 cb e1 00 00 75 9e 48 c7 c7 a4 ea 05 82 c6 05 f [ 57.483513] RSP: 0000:ffffc900000a0d28 EFLAGS: 00010092 [ 57.483516] RAX: 000000000000000f RBX: ffff88802c0a62c0 RCX: ffffc900000a0c2c [ 57.483518] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff810a8371 [ 57.483520] RBP: ffff88803ced6858 R08: 0000000000000000 R09: 0000000000000001 [ 57.483540] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000046 [ 57.483542] R13: ffff88802c0a6c88 R14: ffff88803baab540 R15: ffff88803a0cc078 [ 57.483548] FS: 0000000000000000(0000) GS:ffff88803eb00000(0000) knlGS:0000000000000000 [ 57.483550] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 57.483552] CR2: 000055e7f6780100 CR3: 0000000028c86000 CR4: 00000000000006a0 [ 57.483554] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 57.483556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 57.483559] Call Trace: [ 57.483561] <IRQ> [ 57.483565] __usb_hcd_giveback_urb+0x77/0xe0 [ 57.483570] xhci_giveback_urb_in_irq.isra.0+0x8b/0x140 [ 57.483574] handle_cmd_completion+0xf5b/0x12c0 [ 57.483577] xhci_irq+0x1f6/0x1810 [ 57.483581] ? lockdep_hardirqs_on+0x9e/0x180 [ 57.483584] ? _raw_spin_unlock_irq+0x24/0x30 [ 57.483588] __handle_irq_event_percpu+0x3a/0x260 [ 57.483592] handle_irq_event_percpu+0x1c/0x60 [ 57.483595] handle_irq_event+0x2f/0x4c [ 57.483599] handle_edge_irq+0x7e/0x1a0 [ 57.483603] handle_irq+0x17/0x20 [ 57.483607] do_IRQ+0x54/0x110 [ 57.483610] common_interrupt+0xf/0xf [ 57.483612] </IRQ> Acked-by: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 23377c2 ] When the device is disconnected while passing traffic it is possible to receive out of order urbs causing a memory leak since the skb linked to the current tx urb is not removed. Fix the issue deallocating the skb cleaning up the tx ring. Moreover this patch fixes the following kernel warning [ 57.480771] usb 1-1: USB disconnect, device number 2 [ 57.483451] ------------[ cut here ]------------ [ 57.483462] TX urb mismatch [ 57.483481] WARNING: CPU: 1 PID: 32 at drivers/net/wireless/mediatek/mt7601u/dma.c:245 mt7601u_complete_tx+0x165/00 [ 57.483483] Modules linked in: [ 57.483496] CPU: 1 PID: 32 Comm: kworker/1:1 Not tainted 5.2.0-rc1+ #72 [ 57.483498] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014 [ 57.483502] Workqueue: usb_hub_wq hub_event [ 57.483507] RIP: 0010:mt7601u_complete_tx+0x165/0x1e0 [ 57.483510] Code: 8b b5 10 04 00 00 8b 8d 14 04 00 00 eb 8b 80 3d b1 cb e1 00 00 75 9e 48 c7 c7 a4 ea 05 82 c6 05 f [ 57.483513] RSP: 0000:ffffc900000a0d28 EFLAGS: 00010092 [ 57.483516] RAX: 000000000000000f RBX: ffff88802c0a62c0 RCX: ffffc900000a0c2c [ 57.483518] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff810a8371 [ 57.483520] RBP: ffff88803ced6858 R08: 0000000000000000 R09: 0000000000000001 [ 57.483540] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000046 [ 57.483542] R13: ffff88802c0a6c88 R14: ffff88803baab540 R15: ffff88803a0cc078 [ 57.483548] FS: 0000000000000000(0000) GS:ffff88803eb00000(0000) knlGS:0000000000000000 [ 57.483550] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 57.483552] CR2: 000055e7f6780100 CR3: 0000000028c86000 CR4: 00000000000006a0 [ 57.483554] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 57.483556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 57.483559] Call Trace: [ 57.483561] <IRQ> [ 57.483565] __usb_hcd_giveback_urb+0x77/0xe0 [ 57.483570] xhci_giveback_urb_in_irq.isra.0+0x8b/0x140 [ 57.483574] handle_cmd_completion+0xf5b/0x12c0 [ 57.483577] xhci_irq+0x1f6/0x1810 [ 57.483581] ? lockdep_hardirqs_on+0x9e/0x180 [ 57.483584] ? _raw_spin_unlock_irq+0x24/0x30 [ 57.483588] __handle_irq_event_percpu+0x3a/0x260 [ 57.483592] handle_irq_event_percpu+0x1c/0x60 [ 57.483595] handle_irq_event+0x2f/0x4c [ 57.483599] handle_edge_irq+0x7e/0x1a0 [ 57.483603] handle_irq+0x17/0x20 [ 57.483607] do_IRQ+0x54/0x110 [ 57.483610] common_interrupt+0xf/0xf [ 57.483612] </IRQ> Acked-by: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

Commit 6605fd2 upstream. The test case btrfs/238 reports the warning below: WARNING: CPU: 3 PID: 481 at fs/btrfs/super.c:2509 btrfs_show_devname+0x104/0x1e8 [btrfs] CPU: 2 PID: 1 Comm: systemd Tainted: G W O 5.14.0-rc1-custom #72 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: btrfs_show_devname+0x108/0x1b4 [btrfs] show_mountinfo+0x234/0x2c4 m_show+0x28/0x34 seq_read_iter+0x12c/0x3c4 vfs_read+0x29c/0x2c8 ksys_read+0x80/0xec __arm64_sys_read+0x28/0x34 invoke_syscall+0x50/0xf8 do_el0_svc+0x88/0x138 el0_svc+0x2c/0x8c el0t_64_sync_handler+0x84/0xe4 el0t_64_sync+0x198/0x19c Reason: While btrfs_prepare_sprout() moves the fs_devices::devices into fs_devices::seed_list, the btrfs_show_devname() searches for the devices and found none, leading to the warning as in above. Fix: latest_dev is updated according to the changes to the device list. That means we could use the latest_dev->name to show the device name in /proc/self/mounts, the pointer will be always valid as it's assigned before the device is deleted from the list in remove or replace. The RCU protection is sufficient as the device structure is freed after synchronization. Reported-by: Su Yue <l@damenly.su> Tested-by: Su Yue <l@damenly.su> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

The BPF STX/LDX instruction uses offset relative to the FP to address stack space. Since the BPF_FP locates at the top of the frame, the offset is usually a negative number. However, arm64 str/ldr immediate instruction requires that offset be a positive number. Therefore, this patch tries to convert the offsets. The method is to find the negative offset furthest from the FP firstly. Then add it to the FP, calculate a bottom position, called FPB, and then adjust the offsets in other STR/LDX instructions relative to FPB. FPB is saved using the callee-saved register x27 of arm64 which is not used yet. Before adjusting the offset, the patch checks every instruction to ensure that the FP does not change in run-time. If the FP may change, no offset is adjusted. For example, for the following bpftrace command: bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }' Without this patch, jited code(fragment): 0: bti c 4: stp x29, x30, [sp, #-16]! 8: mov x29, sp c: stp x19, x20, [sp, #-16]! 10: stp x21, x22, [sp, #-16]! 14: stp x25, x26, [sp, #-16]! 18: mov x25, sp 1c: mov x26, #0x0 // #0 20: bti j 24: sub sp, sp, #0x90 28: add x19, x0, #0x0 2c: mov x0, #0x0 // #0 30: mov x10, #0xffffffffffffff78 // #-136 34: str x0, [x25, x10] 38: mov x10, #0xffffffffffffff80 // #-128 3c: str x0, [x25, x10] 40: mov x10, #0xffffffffffffff88 // #-120 44: str x0, [x25, x10] 48: mov x10, #0xffffffffffffff90 // #-112 4c: str x0, [x25, x10] 50: mov x10, #0xffffffffffffff98 // #-104 54: str x0, [x25, x10] 58: mov x10, #0xffffffffffffffa0 // #-96 5c: str x0, [x25, x10] 60: mov x10, #0xffffffffffffffa8 // #-88 64: str x0, [x25, x10] 68: mov x10, #0xffffffffffffffb0 // #-80 6c: str x0, [x25, x10] 70: mov x10, #0xffffffffffffffb8 // #-72 74: str x0, [x25, x10] 78: mov x10, #0xffffffffffffffc0 // #-64 7c: str x0, [x25, x10] 80: mov x10, #0xffffffffffffffc8 // #-56 84: str x0, [x25, x10] 88: mov x10, #0xffffffffffffffd0 // #-48 8c: str x0, [x25, x10] 90: mov x10, #0xffffffffffffffd8 // #-40 94: str x0, [x25, x10] 98: mov x10, #0xffffffffffffffe0 // #-32 9c: str x0, [x25, x10] a0: mov x10, #0xffffffffffffffe8 // #-24 a4: str x0, [x25, x10] a8: mov x10, #0xfffffffffffffff0 // #-16 ac: str x0, [x25, x10] b0: mov x10, #0xfffffffffffffff8 // #-8 b4: str x0, [x25, x10] b8: mov x10, #0x8 // raspberrypi#8 bc: ldr x2, [x19, x10] [...] With this patch, jited code(fragment): 0: bti c 4: stp x29, x30, [sp, #-16]! 8: mov x29, sp c: stp x19, x20, [sp, #-16]! 10: stp x21, x22, [sp, #-16]! 14: stp x25, x26, [sp, #-16]! 18: stp x27, x28, [sp, #-16]! 1c: mov x25, sp 20: sub x27, x25, #0x88 24: mov x26, #0x0 // #0 28: bti j 2c: sub sp, sp, #0x90 30: add x19, x0, #0x0 34: mov x0, #0x0 // #0 38: str x0, [x27] 3c: str x0, [x27, raspberrypi#8] 40: str x0, [x27, raspberrypi#16] 44: str x0, [x27, raspberrypi#24] 48: str x0, [x27, raspberrypi#32] 4c: str x0, [x27, raspberrypi#40] 50: str x0, [x27, raspberrypi#48] 54: str x0, [x27, raspberrypi#56] 58: str x0, [x27, raspberrypi#64] 5c: str x0, [x27, raspberrypi#72] 60: str x0, [x27, raspberrypi#80] 64: str x0, [x27, raspberrypi#88] 68: str x0, [x27, raspberrypi#96] 6c: str x0, [x27, raspberrypi#104] 70: str x0, [x27, raspberrypi#112] 74: str x0, [x27, raspberrypi#120] 78: str x0, [x27, raspberrypi#128] 7c: ldr x2, [x19, raspberrypi#8] [...] Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220321152852.2334294-4-xukuohai@huawei.com

[ Upstream commit 3004081 ] In case runtime PM is enabled, do runtime PM clean up to remove cpu latency qos request, otherwise driver removal may have below kernel dump: [ 19.463299] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000048 [ 19.472161] Mem abort info: [ 19.474985] ESR = 0x0000000096000004 [ 19.478754] EC = 0x25: DABT (current EL), IL = 32 bits [ 19.484081] SET = 0, FnV = 0 [ 19.487149] EA = 0, S1PTW = 0 [ 19.490361] FSC = 0x04: level 0 translation fault [ 19.495256] Data abort info: [ 19.498149] ISV = 0, ISS = 0x00000004 [ 19.501997] CM = 0, WnR = 0 [ 19.504977] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000049f81000 [ 19.511432] [0000000000000048] pgd=0000000000000000, p4d=0000000000000000 [ 19.518245] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 19.524520] Modules linked in: gpio_ir_recv(+) rc_core [last unloaded: rc_core] [ 19.531845] CPU: 0 PID: 445 Comm: insmod Not tainted 6.2.0-rc1-00028-g2c397a46d47c #72 [ 19.531854] Hardware name: FSL i.MX8MM EVK board (DT) [ 19.531859] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 19.551777] pc : cpu_latency_qos_remove_request+0x20/0x110 [ 19.557277] lr : gpio_ir_recv_runtime_suspend+0x18/0x30 [gpio_ir_recv] [ 19.557294] sp : ffff800008ce3740 [ 19.557297] x29: ffff800008ce3740 x28: 0000000000000000 x27: ffff800008ce3d50 [ 19.574270] x26: ffffc7e3e9cea100 x25: 00000000000f4240 x24: ffffc7e3f9ef0e30 [ 19.574284] x23: 0000000000000000 x22: ffff0061803820f4 x21: 0000000000000008 [ 19.574296] x20: ffffc7e3fa75df30 x19: 0000000000000020 x18: ffffffffffffffff [ 19.588570] x17: 0000000000000000 x16: ffffc7e3f9efab70 x15: ffffffffffffffff [ 19.595712] x14: ffff800008ce37b8 x13: ffff800008ce37aa x12: 0000000000000001 [ 19.602853] x11: 0000000000000001 x10: ffffcbe3ec0dff87 x9 : 0000000000000008 [ 19.609991] x8 : 0101010101010101 x7 : 0000000000000000 x6 : 000000000f0bfe9f [ 19.624261] x5 : 00ffffffffffffff x4 : 0025ab8e00000000 x3 : ffff006180382010 [ 19.631405] x2 : ffffc7e3e9ce8030 x1 : ffffc7e3fc3eb810 x0 : 0000000000000020 [ 19.638548] Call trace: [ 19.640995] cpu_latency_qos_remove_request+0x20/0x110 [ 19.646142] gpio_ir_recv_runtime_suspend+0x18/0x30 [gpio_ir_recv] [ 19.652339] pm_generic_runtime_suspend+0x2c/0x44 [ 19.657055] __rpm_callback+0x48/0x1dc [ 19.660807] rpm_callback+0x6c/0x80 [ 19.664301] rpm_suspend+0x10c/0x640 [ 19.667880] rpm_idle+0x250/0x2d0 [ 19.671198] update_autosuspend+0x38/0xe0 [ 19.675213] pm_runtime_set_autosuspend_delay+0x40/0x60 [ 19.680442] gpio_ir_recv_probe+0x1b4/0x21c [gpio_ir_recv] [ 19.685941] platform_probe+0x68/0xc0 [ 19.689610] really_probe+0xc0/0x3dc [ 19.693189] __driver_probe_device+0x7c/0x190 [ 19.697550] driver_probe_device+0x3c/0x110 [ 19.701739] __driver_attach+0xf4/0x200 [ 19.705578] bus_for_each_dev+0x70/0xd0 [ 19.709417] driver_attach+0x24/0x30 [ 19.712998] bus_add_driver+0x17c/0x240 [ 19.716834] driver_register+0x78/0x130 [ 19.720676] __platform_driver_register+0x28/0x34 [ 19.725386] gpio_ir_recv_driver_init+0x20/0x1000 [gpio_ir_recv] [ 19.731404] do_one_initcall+0x44/0x2ac [ 19.735243] do_init_module+0x48/0x1d0 [ 19.739003] load_module+0x19fc/0x2034 [ 19.742759] __do_sys_finit_module+0xac/0x12c [ 19.747124] __arm64_sys_finit_module+0x20/0x30 [ 19.751664] invoke_syscall+0x48/0x114 [ 19.755420] el0_svc_common.constprop.0+0xcc/0xec [ 19.760132] do_el0_svc+0x38/0xb0 [ 19.763456] el0_svc+0x2c/0x84 [ 19.766516] el0t_64_sync_handler+0xf4/0x120 [ 19.770789] el0t_64_sync+0x190/0x194 [ 19.774460] Code: 910003fd a90153f3 aa0003f3 91204021 (f9401400) [ 19.780556] ---[ end trace 0000000000000000 ]--- Signed-off-by: Li Jun <jun.li@nxp.com> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

fixed latency issue caused by disabling preemption for up to 5 seconds

ee67eee

Change-Id: Iab555157c4255d64865a3c7ef2dffee986954d56

popcornmix closed this Sep 1, 2012

ewheelerinc mentioned this pull request May 29, 2013

mmcblk0 error -110 errors (regression?) #280

Closed

jeanlemotan mentioned this pull request Feb 1, 2015

Small optimizations #781

Closed

Fixed latency issue caused by disabling preemption for up to 5 seconds #72

Fixed latency issue caused by disabling preemption for up to 5 seconds #72

Conversation

Himmele commented Aug 1, 2012

grigorig commented Aug 1, 2012

popcornmix commented Aug 1, 2012

Himmele commented Aug 1, 2012

popcornmix commented Aug 1, 2012

Himmele commented Aug 1, 2012

grigorig commented Aug 1, 2012

grigorig commented Aug 1, 2012

Himmele commented Aug 1, 2012

popcornmix commented Aug 1, 2012

popcornmix commented Aug 1, 2012

popcornmix commented Aug 2, 2012

grigorig commented Aug 2, 2012

Himmele commented Aug 2, 2012

popcornmix commented Aug 7, 2012

sulge commented Aug 11, 2012

popcornmix commented Aug 12, 2012

sulge commented Aug 12, 2012

popcornmix commented Aug 12, 2012

sulge commented Aug 12, 2012

popcornmix commented Aug 12, 2012

sulge commented Aug 12, 2012

sulge commented Aug 12, 2012

popcornmix commented Aug 12, 2012

sulge commented Aug 12, 2012

popcornmix commented Aug 12, 2012

sulge commented Aug 12, 2012

sulge commented Aug 12, 2012

popcornmix commented Aug 12, 2012

sulge commented Aug 12, 2012

sulge commented Aug 24, 2012

ddv2005 commented Aug 24, 2012

popcornmix commented Aug 25, 2012

ddv2005 commented Aug 25, 2012

ddv2005 commented Aug 27, 2012

popcornmix commented Aug 27, 2012

popcornmix commented Aug 28, 2012

popcornmix commented Aug 31, 2012

popcornmix commented Sep 1, 2012