Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ftgmac is still not detecting link if cable plugged in after BMC is booted #61

Closed
nkskjames opened this issue Mar 15, 2016 · 23 comments
Closed
Labels

Comments

@nkskjames
Copy link

Here are steps to fail:

  1. LAN cable plugged
  2. BMC booted
  3. Setup network
    [network works fine]
  4. turn off AC
  5. unplug LAN cable
  6. turn on AC
  7. BMC boots
  8. Plug in cable
  9. LAN is not enabled (ftgmac100: NCSI dev is down)
@nkskjames nkskjames added the bug label Mar 15, 2016
@gwshan
Copy link

gwshan commented Mar 17, 2016

Did we issue below commands after step (8) ? By the way, who from IRC channel I can get more info?

ifconfig eth0 down && ifconfig eth0 up

@nkskjames
Copy link
Author

ifconfig eth0 down && ifconfig eth0 up does make network work. The expectation from end user is that when the cable is plugged the link should come up on its own though. Is that not correct? I will see if we can get person who reported this on IRC.

@gwshan
Copy link

gwshan commented Mar 17, 2016

No, when the cable is plugged, the network interface won't be up automatically. When MAC has NCSI enabled, we don't have a PHY attached. That means we don't have a thread monitoring the link status and wake up the MAC when link is ready. With NCSI, we have to issue "ifconifg ethx down && ifconfig eth0 up" after the cable is plugged.

The problem here is the above command doesn't help after cable is plugged.

@nkskjames
Copy link
Author

"The problem here is the above command doesn't help after cable is plugged."
not sure what you mean. ifconfig down/up does work after cable is plugged.

NC-SI specifies an async event notification for things like link state change. Please see section 6.7 in: http://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.0.1.pdf. If that doesn't work, it is not unusual to poll for link status from what I understand.

@gwshan
Copy link

gwshan commented Mar 21, 2016

Yeah, you're correct that the AEN packet will be sent upon link-down event. The NCSI stack will try to choose another available channel if existing. Otherwise, the NCSI interface is down and I guess it's the case. When NCSI stack receives AEN on link-up event, it tries to choose an available channel for communication if we don't have one yet.

I'll try to reproduce this issue on local BMC tomorrow or a bit later...

@nkskjames
Copy link
Author

What is the status of this?

@gwshan
Copy link

gwshan commented Apr 12, 2016

I don't have a chance to reproduce it locally. I will try it later of this week or early of next week.

@anoo1
Copy link

anoo1 commented Apr 15, 2016

Customer reported same issue today. Any progress on recreating?

@gwshan
Copy link

gwshan commented Apr 16, 2016

Not yet, I'll continue on this next Monday when sitting around the BMC :-)

@gwshan
Copy link

gwshan commented Apr 18, 2016

I'm using "dev-4.3" branch as the base source code to debug the issue. Also the hardware platform is Palmetto. The experiment I did is as below and it should match with the experiment that James did.

  • Unplug the cable;
  • Load the kernel and boot it;
  • ifconfig eth0 xxx.xxx.xxx.xxx, and we should see the interface is down;
  • Plug the cable and issue command "ifcofig eth0 down && ifconfig eth0 up". The network is back to service.

One thing I observed from James saw is: Reconfiguring the network interface after plugging the network cable will bring the network back. I also added some code to print the AEN packet received on plugging or unplugging the network cable. However, I never see a AEN packet received.

@nkskjames
Copy link
Author

So what next?

@apopple
Copy link

apopple commented Apr 20, 2016

@gwshan it sounds like you are expecting to receive an AEN packet but aren't? Does the device driver rely on getting this packet? If so we should try and debug why you aren't getting them.

@gwshan
Copy link

gwshan commented Apr 21, 2016

Yes, I expect to receive AEN indicating link change on the active channel and I don't receive it when unplugging the ethernet cable. I added some printk() in the NCSI stack and print the received AEN packet. I don't see it.

@mdmillerii
Copy link

Do you get an AEN when the link drops?
Do you have a poller to make sure the NCSI selection is still active? (see 6.8.3)

@Kenthliu
Copy link

@gwshan Did you try wireshark to check the packet?

@gwshan
Copy link

gwshan commented Apr 27, 2016

@kenliu when the NCSI interface is down, no frames will be received on the
interface. So there're no packets seen on wireshark. As I mentioned before,
I add some printk() in the source code of the NCSI stack that dumps the
packets on receiving AEN packet. Nothing was dumped.

In current design, the NCSI and other (like ARP/IP) packets are transmitted
or received through same channel. When opening the NCSI, the channel is
enabled and NCSI packets exchanged with remote end to setup the NCSI
interface. the channel is going to be disabled if that fails. Otherwise,
the channel is enabled and monitor on AEN packets. In order to support
polling mechanism, we need separate channels for NCSI and other packets. It
will introduce huge changes to current farady driver implementation and I
need evaluate after finishing work at hand. On the other hand, there is one
additional question: when the cable is unplugged, can we probe the
situation by NCSI (Get Link Status packet) response without issue? I need
check it out later as well.

Another thing I'm not satisfied with current NCSI stack is: all available
package/channel are probed when opening up the ethernet interface. However,
once the NCSI topology is finalized for once and it's not going to change
dynamically. I need improve it later.

2016-04-27 12:29 GMT+10:00 KenLiu notifications@github.com:

@gwshan https://github.com/gwshan Did you try wireshark to check the
packet?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#61 (comment)

@jk-ozlabs
Copy link
Member

@nkskjames If we can get some info from Broadcom about the absence of that AEN message, that'd be helpful for us.

@Kenthliu
Copy link

Yes. I already sent an issue to Broadcom. Waiting for the response.
case

@Kenthliu
Copy link

Kenthliu commented May 6, 2016

Hi @gwshan and @jk-ozlabs
Response from Broadcom. Do you agree this?
aen

@gwshan
Copy link

gwshan commented May 6, 2016

Hi Ken, Thanks for checking with broadcom on this. It sounds the case. I
will enable that explicitly with AEN enable
command packet. Prior to that, I have to introduce some changes to farady
driver so that we can always receive
ingress NCSI packets.

However, the statement you had isn't always true according to the
experiment I did. Initially, I never send a AEN enable
packet to target NCSI channel. With a PCI slot reset through GPIO PINs, I
still can receive AEN packet from far end.

2016-05-06 22:37 GMT+10:00 KenLiu notifications@github.com:

Hi @gwshan https://github.com/gwshan and @jk-ozlabs
https://github.com/jk-ozlabs
Response from Broadcom. Do you agree this?
[image: aen]
https://cloud.githubusercontent.com/assets/15260220/15073126/571a8354-13ca-11e6-9aea-c3316b9e34ca.jpg


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#61 (comment)

@gwshan
Copy link

gwshan commented May 7, 2016

Thanks again for Ken's feedback from Broadcom. I added some code in the
NCSI stack to enable AEN explicitly by transmitting a AEN enablement packet
on the active channel. I can receive AEN packet from far end (broadcom)
when unplugging the cable. However the AEN packet is still missed when
plugging the cable. I assume something is missed in the code. I will
continue the debugging.

2016-05-07 0:12 GMT+10:00 Shan Gavin shan.gavin@gmail.com:

Hi Ken, Thanks for checking with broadcom on this. It sounds the case. I
will enable that explicitly with AEN enable
command packet. Prior to that, I have to introduce some changes to farady
driver so that we can always receive
ingress NCSI packets.

However, the statement you had isn't always true according to the
experiment I did. Initially, I never send a AEN enable
packet to target NCSI channel. With a PCI slot reset through GPIO PINs, I
still can receive AEN packet from far end.

2016-05-06 22:37 GMT+10:00 KenLiu notifications@github.com:

Hi @gwshan https://github.com/gwshan and @jk-ozlabs
https://github.com/jk-ozlabs
Response from Broadcom. Do you agree this?
[image: aen]
https://cloud.githubusercontent.com/assets/15260220/15073126/571a8354-13ca-11e6-9aea-c3316b9e34ca.jpg


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#61 (comment)

@gwshan
Copy link

gwshan commented May 9, 2016

A pachset sent to maillist for review. The issue is fixed by it. I had below testing with the patchset on palm4-bmc. Everything looks good:

  • From uboot, download the kernel image;
  • Unplug the ethernet cable;
  • From uboot, boot the kernel image;
  • Configure the ethernet interface when the kernel boots up completely.
    The NCSI interface should be down;
  • Plug the cable, the NCSI interface is up automatically. The interface
    is pingable from external.

@shenki
Copy link
Member

shenki commented May 24, 2016

I think this is resolved. Please re-open if it happens again.

@shenki shenki closed this as completed May 24, 2016
amboar pushed a commit to amboar/linux that referenced this issue Nov 16, 2016
Race condition between registering an I2C device driver and
deregistering an I2C adapter device which is assumed to manage that
I2C device may lead to a NULL pointer dereference due to the
uninitialized list head of driver clients.

The root cause of the issue is that the I2C bus may know about the
registered device driver and thus it is matched by bus_for_each_drv(),
but the list of clients is not initialized and commonly it is NULL,
because I2C device drivers define struct i2c_driver as static and
clients field is expected to be initialized by I2C core:

  i2c_register_driver()             i2c_del_adapter()
    driver_register()                 ...
      bus_add_driver()                ...
        ...                           bus_for_each_drv(..., __process_removed_adapter)
      ...                               i2c_do_del_adapter()
    ...                                   list_for_each_entry_safe(..., &driver->clients, ...)
    INIT_LIST_HEAD(&driver->clients);

To solve the problem it is sufficient to do clients list head
initialization before calling driver_register().

The problem was found while using an I2C device driver with a sluggish
registration routine on a bus provided by a physically detachable I2C
master controller, but practically the oops may be reproduced under
the race between arbitraty I2C device driver registration and managing
I2C bus device removal e.g. by unbinding the latter over sysfs:

% echo 21a4000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind
  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  Internal error: Oops: 17 [openbmc#1] SMP ARM
  CPU: 2 PID: 533 Comm: sh Not tainted 4.9.0-rc3+ openbmc#61
  Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
  task: e5ada400 task.stack: e4936000
  PC is at i2c_do_del_adapter+0x20/0xcc
  LR is at __process_removed_adapter+0x14/0x1c
  Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
  Control: 10c5387d  Table: 35bd004a  DAC: 00000051
  Process sh (pid: 533, stack limit = 0xe4936210)
  Stack: (0xe4937d28 to 0xe4938000)
  Backtrace:
  [<c0667be0>] (i2c_do_del_adapter) from [<c0667cc0>] (__process_removed_adapter+0x14/0x1c)
  [<c0667cac>] (__process_removed_adapter) from [<c0516998>] (bus_for_each_drv+0x6c/0xa0)
  [<c051692c>] (bus_for_each_drv) from [<c06685ec>] (i2c_del_adapter+0xbc/0x284)
  [<c0668530>] (i2c_del_adapter) from [<bf0110ec>] (i2c_imx_remove+0x44/0x164 [i2c_imx])
  [<bf0110a8>] (i2c_imx_remove [i2c_imx]) from [<c051a838>] (platform_drv_remove+0x2c/0x44)
  [<c051a80c>] (platform_drv_remove) from [<c05183d8>] (__device_release_driver+0x90/0x12c)
  [<c0518348>] (__device_release_driver) from [<c051849c>] (device_release_driver+0x28/0x34)
  [<c0518474>] (device_release_driver) from [<c0517150>] (unbind_store+0x80/0x104)
  [<c05170d0>] (unbind_store) from [<c0516520>] (drv_attr_store+0x28/0x34)
  [<c05164f8>] (drv_attr_store) from [<c0298acc>] (sysfs_kf_write+0x50/0x54)
  [<c0298a7c>] (sysfs_kf_write) from [<c029801c>] (kernfs_fop_write+0x100/0x214)
  [<c0297f1c>] (kernfs_fop_write) from [<c0220130>] (__vfs_write+0x34/0x120)
  [<c02200fc>] (__vfs_write) from [<c0221088>] (vfs_write+0xa8/0x170)
  [<c0220fe0>] (vfs_write) from [<c0221e74>] (SyS_write+0x4c/0xa8)
  [<c0221e28>] (SyS_write) from [<c0108a20>] (ret_fast_syscall+0x0/0x1c)

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
shenki pushed a commit that referenced this issue Nov 21, 2016
commit 147b36d upstream.

Race condition between registering an I2C device driver and
deregistering an I2C adapter device which is assumed to manage that
I2C device may lead to a NULL pointer dereference due to the
uninitialized list head of driver clients.

The root cause of the issue is that the I2C bus may know about the
registered device driver and thus it is matched by bus_for_each_drv(),
but the list of clients is not initialized and commonly it is NULL,
because I2C device drivers define struct i2c_driver as static and
clients field is expected to be initialized by I2C core:

  i2c_register_driver()             i2c_del_adapter()
    driver_register()                 ...
      bus_add_driver()                ...
        ...                           bus_for_each_drv(..., __process_removed_adapter)
      ...                               i2c_do_del_adapter()
    ...                                   list_for_each_entry_safe(..., &driver->clients, ...)
    INIT_LIST_HEAD(&driver->clients);

To solve the problem it is sufficient to do clients list head
initialization before calling driver_register().

The problem was found while using an I2C device driver with a sluggish
registration routine on a bus provided by a physically detachable I2C
master controller, but practically the oops may be reproduced under
the race between arbitraty I2C device driver registration and managing
I2C bus device removal e.g. by unbinding the latter over sysfs:

% echo 21a4000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind
  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  Internal error: Oops: 17 [#1] SMP ARM
  CPU: 2 PID: 533 Comm: sh Not tainted 4.9.0-rc3+ #61
  Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
  task: e5ada400 task.stack: e4936000
  PC is at i2c_do_del_adapter+0x20/0xcc
  LR is at __process_removed_adapter+0x14/0x1c
  Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
  Control: 10c5387d  Table: 35bd004a  DAC: 00000051
  Process sh (pid: 533, stack limit = 0xe4936210)
  Stack: (0xe4937d28 to 0xe4938000)
  Backtrace:
  [<c0667be0>] (i2c_do_del_adapter) from [<c0667cc0>] (__process_removed_adapter+0x14/0x1c)
  [<c0667cac>] (__process_removed_adapter) from [<c0516998>] (bus_for_each_drv+0x6c/0xa0)
  [<c051692c>] (bus_for_each_drv) from [<c06685ec>] (i2c_del_adapter+0xbc/0x284)
  [<c0668530>] (i2c_del_adapter) from [<bf0110ec>] (i2c_imx_remove+0x44/0x164 [i2c_imx])
  [<bf0110a8>] (i2c_imx_remove [i2c_imx]) from [<c051a838>] (platform_drv_remove+0x2c/0x44)
  [<c051a80c>] (platform_drv_remove) from [<c05183d8>] (__device_release_driver+0x90/0x12c)
  [<c0518348>] (__device_release_driver) from [<c051849c>] (device_release_driver+0x28/0x34)
  [<c0518474>] (device_release_driver) from [<c0517150>] (unbind_store+0x80/0x104)
  [<c05170d0>] (unbind_store) from [<c0516520>] (drv_attr_store+0x28/0x34)
  [<c05164f8>] (drv_attr_store) from [<c0298acc>] (sysfs_kf_write+0x50/0x54)
  [<c0298a7c>] (sysfs_kf_write) from [<c029801c>] (kernfs_fop_write+0x100/0x214)
  [<c0297f1c>] (kernfs_fop_write) from [<c0220130>] (__vfs_write+0x34/0x120)
  [<c02200fc>] (__vfs_write) from [<c0221088>] (vfs_write+0xa8/0x170)
  [<c0220fe0>] (vfs_write) from [<c0221e74>] (SyS_write+0x4c/0xa8)
  [<c0221e28>] (SyS_write) from [<c0108a20>] (ret_fast_syscall+0x0/0x1c)

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
eddiejames pushed a commit to eddiejames/linux that referenced this issue Jul 10, 2018
When the controller supports the Read LE Resolv List size feature, the
maximum list size are read and now stored.

Before patch:
< HCI Command: LE Read White List... (0x08|0x000f) plen 0  openbmc#55 [hci0] 17.979791
> HCI Event: Command Complete (0x0e) plen 5                openbmc#56 [hci0] 17.980629
      LE Read White List Size (0x08|0x000f) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear White List (0x08|0x0010) plen 0    openbmc#57 [hci0] 17.980786
> HCI Event: Command Complete (0x0e) plen 4                openbmc#58 [hci0] 17.981627
      LE Clear White List (0x08|0x0010) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Maximum Dat.. (0x08|0x002f) plen 0  openbmc#59 [hci0] 17.981786
> HCI Event: Command Complete (0x0e) plen 12               openbmc#60 [hci0] 17.982636
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 17040
        Max RX octets: 251
        Max RX time: 17040

After patch:
< HCI Command: LE Read White List... (0x08|0x000f) plen 0  openbmc#55 [hci0] 13.338168
> HCI Event: Command Complete (0x0e) plen 5                openbmc#56 [hci0] 13.338842
      LE Read White List Size (0x08|0x000f) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear White List (0x08|0x0010) plen 0    openbmc#57 [hci0] 13.339029
> HCI Event: Command Complete (0x0e) plen 4                openbmc#58 [hci0] 13.339939
      LE Clear White List (0x08|0x0010) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Resolving L.. (0x08|0x002a) plen 0  openbmc#59 [hci0] 13.340152
> HCI Event: Command Complete (0x0e) plen 5                openbmc#60 [hci0] 13.340952
      LE Read Resolving List Size (0x08|0x002a) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Read Maximum Dat.. (0x08|0x002f) plen 0  openbmc#61 [hci0] 13.341180
> HCI Event: Command Complete (0x0e) plen 12               openbmc#62 [hci0] 13.341898
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 17040
        Max RX octets: 251
        Max RX time: 17040

Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
eddiejames pushed a commit to eddiejames/linux that referenced this issue Jul 10, 2018
Check for Resolv list supported by controller. So check the supported
commmand first before issuing this command i.e.,HCI_OP_LE_CLEAR_RESOLV_LIST

Before patch:
< HCI Command: LE Read White List... (0x08|0x000f) plen 0  openbmc#55 [hci0] 13.338168
> HCI Event: Command Complete (0x0e) plen 5                openbmc#56 [hci0] 13.338842
      LE Read White List Size (0x08|0x000f) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear White List (0x08|0x0010) plen 0    openbmc#57 [hci0] 13.339029
> HCI Event: Command Complete (0x0e) plen 4                openbmc#58 [hci0] 13.339939
      LE Clear White List (0x08|0x0010) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Resolving L.. (0x08|0x002a) plen 0  openbmc#59 [hci0] 13.340152
> HCI Event: Command Complete (0x0e) plen 5                openbmc#60 [hci0] 13.340952
      LE Read Resolving List Size (0x08|0x002a) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Read Maximum Dat.. (0x08|0x002f) plen 0  openbmc#61 [hci0] 13.341180
> HCI Event: Command Complete (0x0e) plen 12               openbmc#62 [hci0] 13.341898
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 17040
        Max RX octets: 251
        Max RX time: 17040

After patch:
< HCI Command: LE Read White List... (0x08|0x000f) plen 0  openbmc#55 [hci0] 28.919131
> HCI Event: Command Complete (0x0e) plen 5                openbmc#56 [hci0] 28.920016
      LE Read White List Size (0x08|0x000f) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear White List (0x08|0x0010) plen 0    openbmc#57 [hci0] 28.920164
> HCI Event: Command Complete (0x0e) plen 4                openbmc#58 [hci0] 28.920873
      LE Clear White List (0x08|0x0010) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Resolving L.. (0x08|0x002a) plen 0  openbmc#59 [hci0] 28.921109
> HCI Event: Command Complete (0x0e) plen 5                openbmc#60 [hci0] 28.922016
      LE Read Resolving List Size (0x08|0x002a) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear Resolving... (0x08|0x0029) plen 0  openbmc#61 [hci0] 28.922166
> HCI Event: Command Complete (0x0e) plen 4                openbmc#62 [hci0] 28.922872
      LE Clear Resolving List (0x08|0x0029) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Maximum Dat.. (0x08|0x002f) plen 0  openbmc#63 [hci0] 28.923117
> HCI Event: Command Complete (0x0e) plen 12               openbmc#64 [hci0] 28.924030
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 17040
        Max RX octets: 251
        Max RX time: 17040

Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
amboar pushed a commit to amboar/linux that referenced this issue Jun 28, 2019
When calling kmalloc with GFP_KERNEL in case CONFIG_SLOB is unset,
kmem_cache_alloc_trace is called.

In case CONFIG_TRACING is set, kmem_cache_alloc_trace will ball
slab_alloc, which will call slab_pre_alloc_hook which might_sleep_if.

The context in which it is called in this case, the
intel_sst_interrupt_mrfld, calling a sleeping kmalloc generates a BUG():

Fixes: 972b0d4 ("ASoC: Intel: remove GFP_ATOMIC, use GFP_KERNEL")

[   20.250671] BUG: sleeping function called from invalid context at mm/slab.h:422
[   20.250683] in_atomic(): 1, irqs_disabled(): 1, pid: 1791, name: Chrome_IOThread
[   20.250690] CPU: 0 PID: 1791 Comm: Chrome_IOThread Tainted: G        W         4.19.43 openbmc#61
[   20.250693] Hardware name: GOOGLE Kefka, BIOS Google_Kefka.7287.337.0 03/02/2017
[   20.250697] Call Trace:
[   20.250704]  <IRQ>
[   20.250716]  dump_stack+0x7e/0xc3
[   20.250725]  ___might_sleep+0x12a/0x140
[   20.250731]  kmem_cache_alloc_trace+0x53/0x1c5
[   20.250736]  ? update_cfs_rq_load_avg+0x17e/0x1aa
[   20.250740]  ? cpu_load_update+0x6c/0xc2
[   20.250746]  sst_create_ipc_msg+0x2d/0x88
[   20.250752]  intel_sst_interrupt_mrfld+0x12a/0x22c
[   20.250758]  __handle_irq_event_percpu+0x133/0x228
[   20.250764]  handle_irq_event_percpu+0x35/0x7a
[   20.250768]  handle_irq_event+0x36/0x55
[   20.250773]  handle_fasteoi_irq+0xab/0x16c
[   20.250779]  handle_irq+0xd9/0x11e
[   20.250785]  do_IRQ+0x54/0xe0
[   20.250791]  common_interrupt+0xf/0xf
[   20.250795]  </IRQ>
[   20.250800] RIP: 0010:__lru_cache_add+0x4e/0xad
[   20.250806] Code: 00 01 48 c7 c7 b8 df 01 00 65 48 03 3c 25 28 f1 00 00 48 8b 48 08 48 89 ca 48 ff ca f6 c1 01 48 0f 44 d0 f0 ff 42 34 0f b6 0f <89> ca fe c2 88 17 48 89 44 cf 08 80 fa 0f 74 0e 48 8b 08 66 85 c9
[   20.250809] RSP: 0000:ffffa568810bfd98 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffd6
[   20.250814] RAX: ffffd3b904eb1940 RBX: ffffd3b904eb1940 RCX: 0000000000000004
[   20.250817] RDX: ffffd3b904eb1940 RSI: ffffa10ee5c47450 RDI: ffffa10efba1dfb8
[   20.250821] RBP: ffffa568810bfda8 R08: ffffa10ef9c741c1 R09: dead000000000100
[   20.250824] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa10ee8d52a40
[   20.250827] R13: ffffa10ee8d52000 R14: ffffa10ee5c47450 R15: 800000013ac65067
[   20.250835]  lru_cache_add_active_or_unevictable+0x4e/0xb8
[   20.250841]  handle_mm_fault+0xd98/0x10c4
[   20.250848]  __do_page_fault+0x235/0x42d
[   20.250853]  ? page_fault+0x8/0x30
[   20.250858]  do_page_fault+0x3d/0x17a
[   20.250862]  ? page_fault+0x8/0x30
[   20.250866]  page_fault+0x1e/0x30
[   20.250872] RIP: 0033:0x7962fdea9304
[   20.250875] Code: 0f 11 4c 17 f0 c3 48 3b 15 f1 26 31 00 0f 83 e2 00 00 00 48 39 f7 72 0f 74 12 4c 8d 0c 16 4c 39 cf 0f 82 63 01 00 00 48 89 d1 <f3> a4 c3 80 fa 08 73 12 80 fa 04 73 1e 80 fa 01 77 26 72 05 0f b6
[   20.250879] RSP: 002b:00007962f4db5468 EFLAGS: 00010206
[   20.250883] RAX: 00003c8cc9d47008 RBX: 0000000000000000 RCX: 0000000000001b48
[   20.250886] RDX: 0000000000002b40 RSI: 00003c8cc9551000 RDI: 00003c8cc9d48000
[   20.250890] RBP: 00007962f4db5820 R08: 0000000000000000 R09: 00003c8cc9552b48
[   20.250893] R10: 0000562dd1064d30 R11: 00003c8cc825b908 R12: 00003c8cc966d3c0
[   20.250896] R13: 00003c8cc9e280c0 R14: 0000000000000000 R15: 0000000000000000

Signed-off-by: Alex Levin <levinale@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
shenki pushed a commit that referenced this issue Jan 31, 2020
…y section

commit 8068df3 upstream.

When we remove an early section, we don't free the usage map, as the
usage maps of other sections are placed into the same page.  Once the
section is removed, it is no longer an early section (especially, the
memmap is freed).  When we re-add that section, the usage map is reused,
however, it is no longer an early section.  When removing that section
again, we try to kfree() a usage map that was allocated during early
boot - bad.

Let's check against PageReserved() to see if we are dealing with an
usage map that was allocated during boot.  We could also check against
!(PageSlab(usage_page) || PageCompound(usage_page)), but PageReserved() is
cleaner.

Can be triggered using memtrace under ppc64/powernv:

  $ mount -t debugfs none /sys/kernel/debug/
  $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
  $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
   ------------[ cut here ]------------
   kernel BUG at mm/slub.c:3969!
   Oops: Exception in kernel mode, sig: 5 [#1]
   LE PAGE_SIZE=3D64K MMU=3DHash SMP NR_CPUS=3D2048 NUMA PowerNV
   Modules linked in:
   CPU: 0 PID: 154 Comm: sh Not tainted 5.5.0-rc2-next-20191216-00005-g0be1dba7b7c0 #61
   NIP kfree+0x338/0x3b0
   LR section_deactivate+0x138/0x200
   Call Trace:
     section_deactivate+0x138/0x200
     __remove_pages+0x114/0x150
     arch_remove_memory+0x3c/0x160
     try_remove_memory+0x114/0x1a0
     __remove_memory+0x20/0x40
     memtrace_enable_set+0x254/0x850
     simple_attr_write+0x138/0x160
     full_proxy_write+0x8c/0x110
     __vfs_write+0x38/0x70
     vfs_write+0x11c/0x2a0
     ksys_write+0x84/0x140
     system_call+0x5c/0x68
   ---[ end trace 4b053cbd84e0db62 ]---

The first invocation will offline+remove memory blocks.  The second
invocation will first add+online them again, in order to offline+remove
them again (usually we are lucky and the exact same memory blocks will
get "reallocated").

Tested on powernv with boot memory: The usage map will not get freed.
Tested on x86-64 with DIMMs: The usage map will get freed.

Using Dynamic Memory under a Power DLAPR can trigger it easily.

Triggering removal (I assume after previously removed+re-added) of
memory from the HMC GUI can crash the kernel with the same call trace
and is fixed by this patch.

Link: http://lkml.kernel.org/r/20191217104637.5509-1-david@redhat.com
Fixes: 326e1b8 ("mm/sparsemem: introduce a SECTION_IS_EARLY flag")
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Pingfan Liu <piliu@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki pushed a commit that referenced this issue Jul 21, 2020
commit 7733306 upstream.

The "inline" keyword is a hint for the compiler to inline a function.  The
functions system_uses_irq_prio_masking() and gic_write_pmr() are used by
the code running at EL2 on a non-VHE system, so mark them as
__always_inline to make sure they'll always be part of the .hyp.text
section.

This fixes the following splat when trying to run a VM:

[   47.625273] Kernel panic - not syncing: HYP panic:
[   47.625273] PS:a00003c9 PC:0000ca0b42049fc4 ESR:86000006
[   47.625273] FAR:0000ca0b42049fc4 HPFAR:0000000010001000 PAR:0000000000000000
[   47.625273] VCPU:0000000000000000
[   47.647261] CPU: 1 PID: 217 Comm: kvm-vcpu-0 Not tainted 5.8.0-rc1-ARCH+ #61
[   47.654508] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[   47.661139] Call trace:
[   47.663659]  dump_backtrace+0x0/0x1cc
[   47.667413]  show_stack+0x18/0x24
[   47.670822]  dump_stack+0xb8/0x108
[   47.674312]  panic+0x124/0x2f4
[   47.677446]  panic+0x0/0x2f4
[   47.680407] SMP: stopping secondary CPUs
[   47.684439] Kernel Offset: disabled
[   47.688018] CPU features: 0x240402,20002008
[   47.692318] Memory Limit: none
[   47.695465] ---[ end Kernel panic - not syncing: HYP panic:
[   47.695465] PS:a00003c9 PC:0000ca0b42049fc4 ESR:86000006
[   47.695465] FAR:0000ca0b42049fc4 HPFAR:0000000010001000 PAR:0000000000000000
[   47.695465] VCPU:0000000000000000 ]---

The instruction abort was caused by the code running at EL2 trying to fetch
an instruction which wasn't mapped in the EL2 translation tables. Using
objdump showed the two functions as separate symbols in the .text section.

Fixes: 85738e0 ("arm64: kvm: Unmask PMR before entering guest")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200618171254.1596055-1-alexandru.elisei@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki pushed a commit that referenced this issue Jul 22, 2020
commit 7733306 upstream.

The "inline" keyword is a hint for the compiler to inline a function.  The
functions system_uses_irq_prio_masking() and gic_write_pmr() are used by
the code running at EL2 on a non-VHE system, so mark them as
__always_inline to make sure they'll always be part of the .hyp.text
section.

This fixes the following splat when trying to run a VM:

[   47.625273] Kernel panic - not syncing: HYP panic:
[   47.625273] PS:a00003c9 PC:0000ca0b42049fc4 ESR:86000006
[   47.625273] FAR:0000ca0b42049fc4 HPFAR:0000000010001000 PAR:0000000000000000
[   47.625273] VCPU:0000000000000000
[   47.647261] CPU: 1 PID: 217 Comm: kvm-vcpu-0 Not tainted 5.8.0-rc1-ARCH+ #61
[   47.654508] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[   47.661139] Call trace:
[   47.663659]  dump_backtrace+0x0/0x1cc
[   47.667413]  show_stack+0x18/0x24
[   47.670822]  dump_stack+0xb8/0x108
[   47.674312]  panic+0x124/0x2f4
[   47.677446]  panic+0x0/0x2f4
[   47.680407] SMP: stopping secondary CPUs
[   47.684439] Kernel Offset: disabled
[   47.688018] CPU features: 0x240402,20002008
[   47.692318] Memory Limit: none
[   47.695465] ---[ end Kernel panic - not syncing: HYP panic:
[   47.695465] PS:a00003c9 PC:0000ca0b42049fc4 ESR:86000006
[   47.695465] FAR:0000ca0b42049fc4 HPFAR:0000000010001000 PAR:0000000000000000
[   47.695465] VCPU:0000000000000000 ]---

The instruction abort was caused by the code running at EL2 trying to fetch
an instruction which wasn't mapped in the EL2 translation tables. Using
objdump showed the two functions as separate symbols in the .text section.

Fixes: 85738e0 ("arm64: kvm: Unmask PMR before entering guest")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200618171254.1596055-1-alexandru.elisei@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki pushed a commit that referenced this issue Jul 26, 2021
commit 704adfb upstream.

The histogram logic was allowing events with char * pointers to be used as
normal strings. But it was easy to crash the kernel with:

 # echo 'hist:keys=filename' > events/syscalls/sys_enter_openat/trigger

And open some files, and boom!

 BUG: unable to handle page fault for address: 00007f2ced0c3280
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 1173fa067 P4D 1173fa067 PUD 1171b6067 PMD 1171dd067 PTE 0
 Oops: 0000 [#1] PREEMPT SMP
 CPU: 6 PID: 1810 Comm: cat Not tainted 5.13.0-rc5-test+ #61
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
 RIP: 0010:strlen+0x0/0x20
 Code: f6 82 80 2a 0b a9 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2a 0b
a9 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74
10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3

 RSP: 0018:ffffbdbf81567b50 EFLAGS: 00010246
 RAX: 0000000000000003 RBX: ffff93815cdb3800 RCX: ffff9382401a22d0
 RDX: 0000000000000100 RSI: 0000000000000000 RDI: 00007f2ced0c3280
 RBP: 0000000000000100 R08: ffff9382409ff074 R09: ffffbdbf81567c98
 R10: ffff9382409ff074 R11: 0000000000000000 R12: ffff9382409ff074
 R13: 0000000000000001 R14: ffff93815a744f00 R15: 00007f2ced0c3280
 FS:  00007f2ced0f8580(0000) GS:ffff93825a800000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f2ced0c3280 CR3: 0000000107069005 CR4: 00000000001706e0
 Call Trace:
  event_hist_trigger+0x463/0x5f0
  ? find_held_lock+0x32/0x90
  ? sched_clock_cpu+0xe/0xd0
  ? lock_release+0x155/0x440
  ? kernel_init_free_pages+0x6d/0x90
  ? preempt_count_sub+0x9b/0xd0
  ? kernel_init_free_pages+0x6d/0x90
  ? get_page_from_freelist+0x12c4/0x1680
  ? __rb_reserve_next+0xe5/0x460
  ? ring_buffer_lock_reserve+0x12a/0x3f0
  event_triggers_call+0x52/0xe0
  ftrace_syscall_enter+0x264/0x2c0
  syscall_trace_enter.constprop.0+0x1ee/0x210
  do_syscall_64+0x1c/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Where it triggered a fault on strlen(key) where key was the filename.

The reason is that filename is a char * to user space, and the histogram
code just blindly dereferenced it, with obvious bad results.

I originally tried to use strncpy_from_user/kernel_nofault() but found
that there's other places that its dereferenced and not worth the effort.

Just do not allow "char *" to act like strings.

Link: https://lkml.kernel.org/r/20210715000206.025df9d2@rorschach.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Tom Zanussi <zanussi@kernel.org>
Fixes: 79e577c ("tracing: Support string type key properly")
Fixes: 5967bd5 ("tracing: Let filter_assign_type() detect FILTER_PTR_STRING")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants