Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devel/revpi 6.1 handle irq storm #149

Draft
wants to merge 185 commits into
base: devel/revpi-6.1
Choose a base branch
from

Conversation

linosanfilippo-kunbus
Copy link

No description provided.

l1k and others added 30 commits May 1, 2023 09:26
commit 18b8460 upstream.

pwrseq_sd8787 is forced to be built as a module if its dependencies are.

That's unnecessary, it's perfectly fine for it to be built-in even
though the wireless drivers that need it are modules.

Relax the depends definition in Kconfig accordingly.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Matt Ranostay <matt@ranostay.consulting>
Cc: Lubomir Rintel <lkundrak@v3.sk>
Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
commit bba047f upstream.

The Marvell SD8978 (aka NXP IW416) uses identical registers as SD8987,
so reuse the existing mwifiex_reg_sd8987 definition.

Note that mwifiex_reg_sd8977 and mwifiex_reg_sd8997 are likewise
identical, save for the fw_dump_ctrl register:  They define it as 0xf0
whereas mwifiex_reg_sd8987 defines it as 0xf9.  I've verified that
0xf9 is the correct value on SD8978.  NXP's out-of-tree driver uses
0xf9 for all of them, so there's a chance that 0xf0 is not correct
in the mwifiex_reg_sd8977 and mwifiex_reg_sd8997 definitions.  I cannot
test that for lack of hardware, hence am leaving it as is.

NXP has only released a firmware which runs Bluetooth over UART.
Perhaps Bluetooth over SDIO is unsupported by this chipset.
Consequently, only an "sdiouart" firmware image is referenced, not an
alternative "sdsd" image.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/536b4f17a72ca460ad1b07045757043fb0778988.1674827105.git.lukas@wunner.de
commit 7715d79 upstream.

Support the firmware hotfix version in GET_HW_SPEC responses to avoid an
irritating "Unknown api_id: 5" message on probe.

Based on this commit in NXP's GPLv2-licensed out-of-tree driver:
nxp-imx/mwifiex@27fd8ecca504

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/111c7ee895f12d951e95a2edcd06d87ca26a7d0f.1674827105.git.lukas@wunner.de
commit 858e8b7 upstream.

The interrupt handler that sets the boolean variable irq_tested may run on
another CPU as the thread that checks irq_tested as part of the irq test in
tpm_tis_send().

Since nothing guarantees cache coherency between CPUs for unsynchronized
accesses to boolean variables the testing thread might not perceive the
value change done in the interrupt handler.

Avoid this issue by setting the bit TPM_TIS_IRQ_TESTED in the flags field
of the tpm_tis_data struct and by accessing this field with the bit
manipulating functions that provide cache coherency.

Also convert all other existing sites to use the proper macros when
accessing this bitfield.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Michael Niewöhner <linux@mniewoehner.de>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit 282657a upstream.

In disable_interrupts() the TPM_GLOBAL_INT_ENABLE bit is unset in the
TPM_INT_ENABLE register to shut the interrupts off. However modifying the
register is only possible with a held locality. So claim the locality
before disable_interrupts() is called.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Michael Niewöhner <linux@mniewoehner.de>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit 6d789ad upstream.

Both functions tpm_tis_probe_irq_single() and tpm_tis_probe_irq() may setup
the interrupts and then return with an error. This case is indicated by a
missing TPM_CHIP_FLAG_IRQ flag in chip->flags.
Currently the interrupt setup is only undone if tpm_tis_probe_irq_single()
fails. Undo the setup also if tpm_tis_probe_irq() fails.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Michael Niewöhner <linux@mniewoehner.de>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit ed9be0e upstream.

If in tpm_tis_probe_irq_single() an error occurs after the original
interrupt vector has been read, restore the interrupts before the error is
returned.

Since the caller does not check the error value, return -1 in any case that
the TPM_CHIP_FLAG_IRQ flag is not set. Since the return value of function
tpm_tis_gen_interrupt() is not longer used, make it a void function.

Fixes: 1107d06 ("tpm_tis: Introduce intermediate layer for TPM access")
Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit 15d7aa4 upstream.

In tpm_tis_probe_single_irq() interrupt registers TPM_INT_VECTOR,
TPM_INT_STATUS and TPM_INT_ENABLE are modified to setup the interrupts.
Currently these modifications are done without holding a locality thus they
have no effect. Fix this by claiming the (default) locality before the
registers are written.

Since now tpm_tis_gen_interrupt() is called with the locality already
claimed remove locality request and release from this function.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit e87fcf0 upstream.

According to the TPM Interface Specification (TIS) support for "stsValid"
and "commandReady" interrupts is only optional.
This has to be taken into account when handling the interrupts in functions
like wait_for_tpm_stat(). To determine the supported interrupts use the
capability query.

Also adjust wait_for_tpm_stat() to only wait for interrupt reported status
changes. After that process all the remaining status changes by polling
the status register.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Michael Niewöhner <linux@mniewoehner.de>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit 4303553 upstream.

Clean up wait_for_tpm_stat() by moving multiple similar interrupt mask
checks into an own function.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Suggested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit 35f6212 upstream.

After driver initialization tpm_tis_data->locality may only be modified in
case of a LOCALITY CHANGE interrupt. In this case the interrupt handler
iterates over all localities only to assign the active one to
tpm_tis_data->locality.

However this information is never used any more, so the assignment is not
needed.
Furthermore without the assignment tpm_tis_data->locality cannot change any
more at driver runtime, and thus no protection against concurrent
modification is required when the variable is read at other places.

So remove this iteration entirely.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit 7a2f55d upstream.

Implement a usage counter for the (default) locality used by the TPM TIS
driver:
Request the locality from the TPM if it has not been claimed yet, otherwise
only increment the counter. Also release the locality if the counter is 0
otherwise only decrement the counter. Since in case of SPI the register
accesses are locked by means of the SPI bus mutex use a sleepable lock
(i.e. also a mutex) to ensure thread-safety of the counter which may be
accessed by both a userspace thread and the interrupt handler.

By doing this refactor the names of the amended functions to use a more
appropriate prefix.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Michael Niewöhner <linux@mniewoehner.de>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit 0c7e66e upstream.

The TIS interrupt handler at least has to read and write the interrupt
status register. In case of SPI both operations result in a call to
tpm_tis_spi_transfer() which uses the bus_lock_mutex of the spi device
and thus must only be called from a sleepable context.

To ensure this request a threaded interrupt handler.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Michael Niewöhner <linux@mniewoehner.de>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit 0e06926 upstream.

Writing the TPM_INT_STATUS register in the interrupt handler to clear the
interrupts only has effect if a locality is held. Since this is not
guaranteed at the time the interrupt is fired, claim the locality
explicitly in the handler.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Michael Niewöhner <linux@mniewoehner.de>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit 955df4f upstream.

In tpm_tis_resume() make sure that the locality has been claimed when
tpm_tis_reenable_interrupts() is called. Otherwise the writings to the
register might not have any effect.

Fixes: 45baa1d ("tpm_tis: Re-enable interrupts upon (S3) resume")
Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit 548eb51 upstream.

In tpm_tis_gen_interrupt() a request for a property value is sent to the
TPM to test if interrupts are generated. However after a power cycle the
TPM responds with TPM_RC_INITIALIZE which indicates that the TPM is not
yet properly initialized.
Fix this by first starting the TPM up before the request is sent. For this
the startup implementation is removed from tpm_chip_register() and put
into the new function tpm_chip_startup() which is called before the
interrupts are tested.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit e644b2f upstream.

The test for interrupts in tpm_tis_send() is skipped if the flag
TPM_CHIP_FLAG_IRQ is not set. Since the current code never sets the flag
initially the test is never executed.

Fix this by setting the flag in tpm_tis_gen_interrupt() right after
interrupts have been enabled and before the test is executed.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Tested-by: Michael Niewöhner <linux@mniewoehner.de>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
commit 0c8862d upstream.

TPM chip bootstrapping was removed from tpm_chip_register(), and it
was relocated to tpm_tis_core. This breaks all drivers which are not
based on tpm_tis because the chip will not get properly initialized.

Take the corrective steps:
1. Rename tpm_chip_startup() as tpm_chip_bootstrap() and make it one-shot.
2. Call tpm_chip_bootstrap() in tpm_chip_register(), which reverts the
   things  as tehy used to be.

Cc: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Fixes: 548eb51 ("tpm, tpm_tis: startup chip before testing for interrupts")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Link: https://lore.kernel.org/all/ZEjqhwHWBnxcaRV5@xpf.sh.intel.com/
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
This reverts commit 39ec3b8.

Signed-off-by: Philipp Rosenberger <p.rosenberger@kunbus.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
This reverts commit 5915ff6.

Signed-off-by: Philipp Rosenberger <p.rosenberger@kunbus.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Fix lockup in dwc_otg driver that leads to a system freeze of the
4-core Raspberry Pi board when RT Preempt kernel is in use or when
interrupts are threaded in general.
The lockup occurs when the irq handler thread gets preempted while it
holds the fiq spin lock.
The patch makes sure to disable local irq while fiq spin lock is held
irrespective of whether the interrupt is threaded or not.
The patch also unifies the use of the fiq spin lock outside the fiq
handler by introducing two function-like macros fiq_fsm_spin_lock_irqsave
and fiq_fsm_spin_unlock_irqrestore.

Under RT kernel, the bug can be reproduced in a few minutes by running
hackbench and cyclictest in this way
$ ( while true; do nice hackbench 30 >/dev/null; done )&
$ echo "run 'kill $!' to stop hackbench"
$ cyclictest -a -t -n -p 80

Signed-off-by: Oussama Ghorbel <ghorbel@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
What I did - no rocket science [1]:

* Applied RT-patches 4.19.72-rt26 on top of rpi-4.19.y / 4.19.79 (most recent
  version applying rt-patch properly)
* Applied a slightly rebased version of the original (4.14) fiq-patch [2]
* grepped for 'fiq_fsm_spin_lock(' and 'fiq_fsm_spin_unlock(' and added missing
  rt-specific replacements
* rebased changes back to rpi-4.19.y-rt

What this patch does:

* add one missing pair of fiq_fsm_spin_lock/fiq_fsm_spin_unlock replacements

With builds of [1] Rapsi3 is running without a singe issue for two weeks now
and it was stressed by

* moving gigabytes from USB-Stick to SDCard
* several usb-midi-keyboard jam sessions

Addresses [3]

[1] https://github.com/schnitzeltony/meta-raspi-light/tree/master/recipes-kernel/linux
[2] raspberrypi/linux@05dd5c4
[3] raspberrypi/linux#2943

Signed-off-by: Andreas Müller <schnitzeltony@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
To build external modules, one needs executables such as fixdep and
modpost.  But when cross-compiling the kernel, those executables are
only generated for the host arch, not the target arch.

Distributions ship packages which users need to install in order to
build external modules.  These packages contain headers and Makefiles
from the kernel source tree as well as fixdep & friends.  Usually the
packages are cross-built on a single host arch for the multitude of
arches supported by the distribution.

But because fixdep & friends are only compiled for the host arch,
distributions are forced to duplicate these executables' Makefiles
in order to cross-compile them.  For instance, these are Debian's
duplicated Makefiles:

https://salsa.debian.org/kernel-team/linux/-/tree/master/debian/rules.d/scripts

Make distribution maintainers' lives easier by providing a new target
"make kbuild_install" which cross-compiles the required executables for
the target arch and installs them below $(INSTALL_KBUILD_PATH).

To avoid wasting compile time and disk space, compile only programs
which are essential for building external modules.  They are declared
essential by adding them to extmodprogs.  For now, these are:
recordmcount sign-file fixdep genksyms modpost

The rules to cross-compile the executables reside in Makefile.kbuildinst.
They are similar to the rules in Makefile.host, but I've left out rules
for C++ executables and shared libraries as they're not necessary.

The host versions of the executables are generated in $(obj) whereas
the cross-compiled versions are generated in $(obj)/.cross/ .  This
necessitates a modification of Makefile.lib because the multi_depend
macro creates rules which assume that the target and its dependencies
always reside in $(obj).  Refactor the macro to use $(dir $1) in lieu
of $(obj).  Likewise, the target-stem macro strips $(obj)/ instead
of $(obj)/.cross/.  Refactor it to use $(notdir $@).

In a future step, Makefile.kbuildinst may be extended to also install
headers and other bits necessary for building external modules.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Muxes never switch from one stable state to another instantaneously.
If a driver switches the mux, it needs to delay until the mux has
settled or else the mux will be in an undefined state.

The delay is specific to the hardware, so support setting a per-mux
delay in the core and amend the gpio-mux driver to allow specifying the
delay in the devicetree.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
The code comment preceding register_console() claims that:

   "There are two types of consoles - bootconsoles (early_printk) and
    "real" consoles (everything which is not a bootconsole) which are
    handled differently. [...]
    As soon as a "real" console is registered, all bootconsoles
    will be unregistered automatically."

But that's not what the code does:  The code unregisters bootconsoles
only when the *preferred* console registers, i.e. the last one on the
command line.  If that console's driver never registers (e.g. because
it is disabled in the kernel config), bootconsoles stay around
indefinitely.  Should the command line contain both a bootconsole as
well as a real console on the same serial port, all messages are logged
twice once the real console registers.

Moreover, the log buffer is replayed once the real console registers
even though the messages were already emitted by the bootconsole.

Amend the code to be congruent with the above-quoted code comment and
thereby avoid these issues.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
If a TPM 2.0 chip has not been initialized by the platform firmware,
a message "A TPM error (256) occurred attempting the self test" is
emitted with KERN_ERR severity in addition to a KERN_INFO message
"starting up the TPM manually".  The former message is caused by
tpm_transmit_cmd() misinterpreting the (expected) TPM2_RC_INITIALIZE
response code as an error.

The error message is particularly inappropriate on TPMs which are not
used for integrity measurements but merely to store key material and
which are hence not touched by the firmware.  (In our case, a TPM
attached to a Raspberry Pi.)

Silence the error by adding the response code in question to the
existing whitelist of allowed response codes.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
On the first revision of the "RevPi Core" open source PLC, the ks8851
interrupt pin is not connected.  Work around by polling the interrupt
once per millisecond.

Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Tristram Ha <tristram.ha@micrel.com>
Cc: David J. Choi <david.choi@micrel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
linosanfilippo-kunbus and others added 18 commits May 12, 2023 07:08
There is no need to use a u16 data type for variables that hold the length
of IO or GATEWAY packets (this is since the length of an IO packet can be
at max 256 and the length of a GATEWAY at max 32 bytes).
So adjust the data types of local variables as well as variables in the
function parameter lists.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Both functions pibridge_req_gate_tmt() and pibridge_req_io() do similar tasks
but the implementation differs even for the code parts with the same
functionality. Bring the implementation of those code parts in
pibridge_req_gate_tmt() into line with the implementation in pibridge_req_io().

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Both functions pibridge_recv_timeout() and pibridge_discard_timeout() are
waiting uninterruptibly for events. However in the waker function
pibridge_receive_buf() only interruptible tasks are woken up. As a
consequence the waiters are sleeping until the timeout expires which leads
to long delays in the communcation with the RevPi hardware modules.

Fix this by calling wake_up() instead of wake_up_interruptible() and thus
waking up uninterruptible instead of interruptible tasks.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
arm/configs/revpi-v6: Enable PiBridge driver
arm/configs/revpi-v7: Enable PiBridge driver
arm/configs/revpi-v7l: Enable PiBridge driver
arm64/configs/revpi-v8: Enable PiBridge driver

Signed-off-by: Nicolai Buchwitz <n.buchwitz@kunbus.com>
In functions pibridge_req_io() and pibridge_req_gate_tmt() return an error
if the size of the received packet does not match the expected size (as
specified by the passed parameter "rcv_len").

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Silence compiler warning for implicit casts on arm64 builds, by prefix
them with z.

Signed-off-by: Nicolai Buchwitz <n.buchwitz@kunbus.com>
Add missing space between text and format specifiers in log output.

Signed-off-by: Nicolai Buchwitz <n.buchwitz@kunbus.com>
Forcing callers to pass u8 * pointers to the pibridge API is cumbersome
as it results in plenty of type casts.  Avoid by switching to void *
pointers.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Using a MIO module with a connect produces lots of error messages like the
following in the log file.

piControl: recv len from pibridge err(got:0, exp:20)
piControl: talk with mio for aio data error(addr:31, ret:-70)
piControl: crc for dio data err(got:0, expect:103)
piControl: recv len from pibridge err(got:0, exp:20)
piControl: talk with mio for aio data error(addr:31, ret:-70)
piControl: recv len from pibridge err(got:0, exp:20)
piControl: talk with mio for dio data error(addr:31, ret:-70)

The reason for these messages are repeated incomplete datagram receptions
for sent data requests. Turning the tracing on in the pibridge module
showed that the receiving process does not wait long enough to receive all
data:

piControl I/O-1579    [003] ....... 81000.509144: pibridge_send_begin: datalen=12 data:{0x1f,0x9,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x16}
piControl I/O-1579    [003] ....... 81000.510308: pibridge_send_end: datalen=12
piControl I/O-1579    [003] ....... 81000.510309: pibridge_receive_begin: datalen=20
piControl I/O-1579    [003] ....... 81000.512038: pibridge_receive_timeout: len=16 timeout=10
piControl I/O-1579    [003] ....... 81000.512038: pibridge_receive_end: datalen=16 data:{0x9f,0x11,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0}
kworker/u8:0-507     [003] ....... 81000.512059: pibridge_wakeup_receive_buffer: Wakeup 0

In the case shown above the process starts waiting for the response packet
at 81000.510309 but at 81000.512038 (i.e. only 1,73 ms later) the wait
already ends with a timeout. This is odd since the implementation uses
wait_event_timeout() with a 10 ms timeout.

However the timeout is expected as jiffies by this function and for a
kernel that uses 100 jiffies per HZ 10 ms are converted into only 1 jiffy.

So the timeout happens after the next timer tick, which may be
significantly shorter than the full jiffy period (10 ms) in case that the
waiting starts in the middle of a jiffy period.

Fix this issue by waiting at least one additional jiffy to make sure that
the timeout does not happen before at least one full jiffy period of 10 ms.

(See also this thread in which a similar issue for the use of
wait_event_timeout() with a 1 jiffy timeout has been reported:
https://lore.kernel.org/lkml/529CA42A.3040504@freebox.fr/)

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
In pibridge_comm.h add missing include file for linux data types.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Move structure definitions for GATE and IO packet headers as well as the
macro for the fifo size into the new header file pibridge.h.
By doing this extend the fifo size to a power of 2 since this is what the
kfifo API will extend the fifo to, anyway. Also rename the macro from
PIBRIDGE_RECV_BUFFER_SIZE to the more appropriate PIBRIDGE_RECV_FIFO_SIZE.

This is in preparation of implementing tracing macros which are defined in
a seperate header file but need access to the definitions mentioned above.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Add pibridge_trace.h with macros that provide tracing information about
- IO and GATEWAY packet headers and data
- CRC
- expired timeouts
- wakeup callback

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Use the pibridge tracing macros to trace
- packet headers
- packet data
- CRC
- expired timeouts
- wakeup

For this reason slightly modify the implemenation in
pibridge_discard_timeout() and pibridge_recv_timeout(). Furthermore use a
wrapper function around serdev_device_write_wakeup() that allows placing
a tracepoint for a wakeup-write event.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Add comments to the tracing macros that describe which information each event
provides and when the event is triggered.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Use wait_event_hrtimeout() instead of wait_event_timeout() to wait for more
data. The latter uses jiffies to specify the timeout which requires a
minimum of 2 jiffies (which equals 20 msecs) to avoid a timeout before the
the IO timeout interval used by the pibridge module (10 msecs) have passed.

With wait_event_hrtimeout() it is possible to specify the exact 10 msecs
as a timeout so we do not have to wait up to 20 msecs in case that no more
data arrives.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
In function pibridge_recv_timeout() do not return 0 if the number of
received bytes is smaller than the size of the provided receive buffer.
Instead return the number of received bytes.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
In function pibridge_req_io() do not return 0 in case of success but
instead return the number of received bytes.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
In function pibridge_req_gate_tmt() do not return 0 in case of success but
instead return the number of received bytes.

Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
@linosanfilippo-kunbus linosanfilippo-kunbus force-pushed the devel/revpi-6.1_handle_irq_storm branch from 7fc33e4 to 56cda95 Compare May 18, 2023 13:05
@l1k l1k force-pushed the devel/revpi-6.1 branch from 760c1bc to f9b9a0b Compare May 24, 2023 10:17
After activation of interrupts for TPM TIS drivers 0-day reports an
interrupt storm on an Intel(R) Xeon(R) Gold 6346.

Fix this by detecting the storm and falling back to polling:
Count the number of unhandled interrupts within a 10 ms time interval. In
case that more than 1000 were unhandled deactivate interrupts entirely,
deregister the handler and use polling instead.

The storm detection logic equals the implementation in note_interrupt()
which uses timestamps and counters stored in struct irq_desc. Since this
structure is private to the generic interrupt core the TPM TIS core uses
its own timestamps and counters. Furthermore the TPM interrupt handler
always returns IRQ_HANDLED to prevent the generic interrupt core from
processing the interrupt storm.

Since the interrupt deregistration function devm_free_irq() waits for all
interrupt handlers to finish, only trigger a worker in the interrupt
handler and do the unregistration in the worker to avoid a deadlock.

Reported-by: kernel test robot <yujie.liu@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202305041325.ae8b0c43-yujie.liu@intel.com/
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
@linosanfilippo-kunbus linosanfilippo-kunbus force-pushed the devel/revpi-6.1_handle_irq_storm branch from 56cda95 to 8a2d7ee Compare May 30, 2023 13:31
@l1k l1k force-pushed the devel/revpi-6.1 branch 2 times, most recently from ad16fae to f798b77 Compare July 3, 2023 13:58
@l1k l1k force-pushed the devel/revpi-6.1 branch 2 times, most recently from 2870926 to c30195e Compare July 11, 2023 08:08
@l1k l1k force-pushed the devel/revpi-6.1 branch from c30195e to 1210ab7 Compare July 18, 2023 07:30
@l1k l1k force-pushed the devel/revpi-6.1 branch from 08b168f to 9284b47 Compare August 22, 2023 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants