bonw14 ethernet #24

jackpot51 · 2020-08-13T20:19:32Z

Rebases on upstream and syncs drivers/net/ethernet/realtek with upstream

Needs to be force pushed when approved

jacobgkau · 2020-08-13T23:01:36Z

I'm booted into d2542ef and am not seeing Ethernet. Here is some output:
lspci -vvv: lspci.txt
ip link: ip-link.txt
dmesg: dmesg.txt
uname -a: uname.txt

leviport · 2020-08-18T17:49:49Z

This is looking good on oryp6 and thelio-b2

jacobgkau

Also tested on darp6, addw2, and serw12, not seeing any regressions caused by the linux and linux-firmware changes.

(Ethernet on the bonw14 is working with these branches now, since that hasn't been stated here.)

BugLink: https://bugs.launchpad.net/bugs/1902137 [ Upstream commit 859d510 ] If the specified/hinted sink is not reachable from a subset of the CPUs, we could end up unable to trace the event on those CPUs. This is the best effort we could do until we support 1:1 configurations. Fail gracefully in such cases avoiding a WARN_ON, which can be easily triggered by the user on certain platforms (Arm N1SDP), with the following trace paths : CPU0 \ -- Funnel0 --> ETF0 --> / \ CPU1 \ MainFunnel CPU2 / \ / -- Funnel1 --> ETF1 --> / CPU1 $ perf record --per-thread -e cs_etm/@ETF0/u -- <app> could trigger the following WARNING, when the event is scheduled on CPU2. [10919.513250] ------------[ cut here ]------------ [10919.517861] WARNING: CPU: 2 PID: 24021 at drivers/hwtracing/coresight/coresight-etm-perf.c:316 etm_event_start+0xf8/0x100 ... [10919.564403] CPU: 2 PID: 24021 Comm: perf Not tainted 5.8.0+ #24 [10919.570308] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--) [10919.575865] pc : etm_event_start+0xf8/0x100 [10919.580034] lr : etm_event_start+0x80/0x100 [10919.584202] sp : fffffe001932f940 [10919.587502] x29: fffffe001932f940 x28: fffffc834995f800 [10919.592799] x27: 0000000000000000 x26: fffffe0011f3ced0 [10919.598095] x25: fffffc837fce244c x24: fffffc837fce2448 [10919.603391] x23: 0000000000000002 x22: fffffc8353529c00 [10919.608688] x21: fffffc835bb31000 x20: 0000000000000000 [10919.613984] x19: fffffc837fcdcc70 x18: 0000000000000000 [10919.619281] x17: 0000000000000000 x16: 0000000000000000 [10919.624577] x15: 0000000000000000 x14: 00000000000009f8 [10919.629874] x13: 00000000000009f8 x12: 0000000000000018 [10919.635170] x11: 0000000000000000 x10: 0000000000000000 [10919.640467] x9 : fffffe00108cd168 x8 : 0000000000000000 [10919.645763] x7 : 0000000000000020 x6 : 0000000000000001 [10919.651059] x5 : 0000000000000002 x4 : 0000000000000001 [10919.656356] x3 : 0000000000000000 x2 : 0000000000000000 [10919.661652] x1 : fffffe836eb40000 x0 : 0000000000000000 [10919.666949] Call trace: [10919.669382] etm_event_start+0xf8/0x100 [10919.673203] etm_event_add+0x40/0x60 [10919.676765] event_sched_in.isra.134+0xcc/0x210 [10919.681281] merge_sched_in+0xb0/0x2a8 [10919.685017] visit_groups_merge.constprop.140+0x15c/0x4b8 [10919.690400] ctx_sched_in+0x15c/0x170 [10919.694048] perf_event_sched_in+0x6c/0xa0 [10919.698130] ctx_resched+0x60/0xa0 [10919.701517] perf_event_exec+0x288/0x2f0 [10919.705425] begin_new_exec+0x4c8/0xf58 [10919.709247] load_elf_binary+0x66c/0xf30 [10919.713155] exec_binprm+0x15c/0x450 [10919.716716] __do_execve_file+0x508/0x748 [10919.720711] __arm64_sys_execve+0x40/0x50 [10919.724707] do_el0_svc+0xf4/0x1b8 [10919.728095] el0_sync_handler+0xf8/0x124 [10919.732003] el0_sync+0x140/0x180 Even though we don't support using separate sinks for the ETMs yet (e.g, for 1:1 configurations), we should at least honor the user's choice and handle the limitations gracefully, by simply skipping the tracing on ETMs which can't reach the requested sink. Fixes: f9d81a6 ("coresight: perf: Allow tracing on hotplugged CPUs") Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Reported-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20200916191737.4001561-11-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com>

…ULL pointer BugLink: https://bugs.launchpad.net/bugs/1923069 [ Upstream commit 0d96968 ] When function return fail to __ath11k_mac_register after success called ieee80211_register_hw, then it set wiphy->dev.parent to NULL by SET_IEEE80211_DEV(ar->hw, NULL) in end of __ath11k_mac_register, then cfg80211_get_drvinfo will be called by below call stack, but the wiphy->dev.parent is NULL, so kernel crash. Call stack to cfg80211_get_drvinfo: NetworkManager 826 [001] 6696.731371: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) ffffffffc107d8f1 cfg80211_get_drvinfo+0x1 (/lib/modules/5.10.0-rc1-wt-ath+/kernel/net/wireless-back/cfg80211.ko) ffffffff9d8fc529 ethtool_get_drvinfo+0x99 (vmlinux) ffffffff9d90080e dev_ethtool+0x1dbe (vmlinux) ffffffff9d8b88f7 dev_ioctl+0xb7 (vmlinux) ffffffff9d8668de sock_do_ioctl+0xae (vmlinux) ffffffff9d866d60 sock_ioctl+0x350 (vmlinux) ffffffff9d2ca30e __x64_sys_ioctl+0x8e (vmlinux) ffffffff9da0dda3 do_syscall_64+0x33 (vmlinux) ffffffff9dc0008c entry_SYSCALL_64_after_hwframe+0x44 (vmlinux) 7feb5f673007 __GI___ioctl+0x7 (/lib/x86_64-linux-gnu/libc-2.23.so) 0 [unknown] ([unknown]) Code of cfg80211_get_drvinfo, the pdev which is wiphy->dev.parent is NULL when kernel crash: void cfg80211_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) { struct wireless_dev *wdev = dev->ieee80211_ptr; struct device *pdev = wiphy_dev(wdev->wiphy); if (pdev->driver) .... kernel crash log: [ 973.619550] ath11k_pci 0000:05:00.0: failed to perform regd update : -16 [ 973.619555] ath11k_pci 0000:05:00.0: ath11k regd update failed: -16 [ 973.619566] ath11k_pci 0000:05:00.0: failed register the radio with mac80211: -16 [ 973.619618] ath11k_pci 0000:05:00.0: failed to create pdev core: -16 [ 973.636035] BUG: kernel NULL pointer dereference, address: 0000000000000068 [ 973.636046] #PF: supervisor read access in kernel mode [ 973.636050] #PF: error_code(0x0000) - not-present page [ 973.636054] PGD 800000012452e067 P4D 800000012452e067 PUD 12452d067 PMD 0 [ 973.636064] Oops: 0000 [#1] SMP PTI [ 973.636072] CPU: 3 PID: 848 Comm: NetworkManager Kdump: loaded Tainted: G W OE 5.10.0-rc1-wt-ath+ #24 [ 973.636076] Hardware name: LENOVO 418065C/418065C, BIOS 83ET63WW (1.33 ) 07/29/2011 [ 973.636161] RIP: 0010:cfg80211_get_drvinfo+0x25/0xd0 [cfg80211] [ 973.636169] Code: e9 c9 fe ff ff 66 66 66 66 90 55 53 ba 20 00 00 00 48 8b af 08 03 00 00 48 89 f3 48 8d 7e 04 48 8b 45 00 48 8b 80 90 01 00 00 <48> 8b 40 68 48 85 c0 0f 84 8d 00 00 00 48 8b 30 e8 a6 cc 72 c7 48 [ 973.636174] RSP: 0018:ffffaafb4040bbe0 EFLAGS: 00010286 [ 973.636180] RAX: 0000000000000000 RBX: ffffaafb4040bbfc RCX: 0000000000000000 [ 973.636184] RDX: 0000000000000020 RSI: ffffaafb4040bbfc RDI: ffffaafb4040bc00 [ 973.636188] RBP: ffff8a84c9568950 R08: 722d302e30312e35 R09: 74612d74772d3163 [ 973.636192] R10: 3163722d302e3031 R11: 2b6874612d74772d R12: ffffaafb4040bbfc [ 973.636196] R13: 00007ffe453707c0 R14: ffff8a84c9568000 R15: 0000000000000000 [ 973.636202] FS: 00007fd3d179b940(0000) GS:ffff8a84fa2c0000(0000) knlGS:0000000000000000 [ 973.636206] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 973.636211] CR2: 0000000000000068 CR3: 00000001153b6002 CR4: 00000000000606e0 [ 973.636215] Call Trace: [ 973.636234] ethtool_get_drvinfo+0x99/0x1f0 [ 973.636246] dev_ethtool+0x1dbe/0x2be0 [ 973.636256] ? mntput_no_expire+0x35/0x220 [ 973.636264] ? inet_ioctl+0x1ce/0x200 [ 973.636274] ? tomoyo_path_number_perm+0x68/0x1d0 [ 973.636282] ? kmem_cache_alloc+0x3cb/0x430 [ 973.636290] ? dev_ioctl+0xb7/0x570 [ 973.636295] dev_ioctl+0xb7/0x570 [ 973.636307] sock_do_ioctl+0xae/0x150 [ 973.636315] ? sock_ioctl+0x350/0x3c0 [ 973.636319] sock_ioctl+0x350/0x3c0 [ 973.636332] ? __x64_sys_ioctl+0x8e/0xd0 [ 973.636339] ? dlci_ioctl_set+0x30/0x30 [ 973.636346] __x64_sys_ioctl+0x8e/0xd0 [ 973.636359] do_syscall_64+0x33/0x80 [ 973.636368] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Sequence of function call when wlan load for success case when function __ath11k_mac_register return 0: kworker/u16:3-e 2922 [001] 6696.729734: probe:ieee80211_register_hw: (ffffffffc116ae60) kworker/u16:3-e 2922 [001] 6696.730210: probe:ieee80211_if_add: (ffffffffc1185cc0) NetworkManager 826 [001] 6696.731345: probe:ethtool_get_drvinfo: (ffffffff9d8fc490) NetworkManager 826 [001] 6696.731371: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) NetworkManager 826 [001] 6696.731639: probe:ethtool_get_drvinfo: (ffffffff9d8fc490) NetworkManager 826 [001] 6696.731653: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) NetworkManager 826 [001] 6696.732866: probe:ethtool_get_drvinfo: (ffffffff9d8fc490) NetworkManager 826 [001] 6696.732893: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) systemd-udevd 3850 [003] 6696.737199: probe:ethtool_get_drvinfo: (ffffffff9d8fc490) systemd-udevd 3850 [003] 6696.737226: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) NetworkManager 826 [000] 6696.759950: probe:ethtool_get_drvinfo: (ffffffff9d8fc490) NetworkManager 826 [000] 6696.759967: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) NetworkManager 826 [000] 6696.760057: probe:ethtool_get_drvinfo: (ffffffff9d8fc490) NetworkManager 826 [000] 6696.760062: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) After apply this patch, kernel crash gone, and below is the test case's sequence of function call and log when wlan load with fail by function ath11k_regd_update, and __ath11k_mac_register return fail: kworker/u16:5-e 192 [001] 215.174388: probe:ieee80211_register_hw: (ffffffffc1131e60) kworker/u16:5-e 192 [000] 215.174973: probe:ieee80211_if_add: (ffffffffc114ccc0) NetworkManager 846 [001] 215.175857: probe:ethtool_get_drvinfo: (ffffffff928fc490) kworker/u16:5-e 192 [000] 215.175867: probe:ieee80211_unregister_hw: (ffffffffc1131970) NetworkManager 846 [001] 215.175880: probe:cfg80211_get_drvinfo: (ffffffffc107f8f0) NetworkManager 846 [001] 215.176105: probe:ethtool_get_drvinfo: (ffffffff928fc490) NetworkManager 846 [001] 215.176118: probe:cfg80211_get_drvinfo: (ffffffffc107f8f0) [ 215.175859] ath11k_pci 0000:05:00.0: ath11k regd update failed: -16 NetworkManager 846 [001] 215.196420: probe:ethtool_get_drvinfo: (ffffffff928fc490) NetworkManager 846 [001] 215.196430: probe:cfg80211_get_drvinfo: (ffffffffc107f8f0) [ 215.258598] ath11k_pci 0000:05:00.0: failed register the radio with mac80211: -16 [ 215.258613] ath11k_pci 0000:05:00.0: failed to create pdev core: -16 When ath11k_regd_update or ath11k_debugfs_register return fail, function ieee80211_unregister_hw of mac80211 will be called, then it will wait untill cfg80211_get_drvinfo finished, the wiphy->dev.parent is not NULL at this moment, after that, it set wiphy->dev.parent to NULL by SET_IEEE80211_DEV(ar->hw, NULL) in end of __ath11k_mac_register, so not happen kernel crash. Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1 Signed-off-by: Wen Gong <wgong@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/1608607824-16067-1-git-send-email-wgong@codeaurora.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

[ Upstream commit d412137 ] The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer test_perf_buffer:PASS:nr_cpus 0 nsec test_perf_buffer:PASS:nr_on_cpus 0 nsec test_perf_buffer:PASS:skel_load 0 nsec test_perf_buffer:PASS:attach_kprobe 0 nsec test_perf_buffer:PASS:perf_buf__new 0 nsec test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU #24 skipping offline CPU #25 skipping offline CPU #26 skipping offline CPU #27 skipping offline CPU #28 skipping offline CPU #29 skipping offline CPU #30 skipping offline CPU #31 test_perf_buffer:PASS:perf_buffer__poll 0 nsec test_perf_buffer:PASS:seen_cpu_cnt 0 nsec test_perf_buffer:FAIL:buf_cnt got 24, expected 32 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Changing the test to check online cpus instead of possible. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211021114132.8196-2-jolsa@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit cf90b74 upstream. During stress test with attaching and detaching VF from KVM and simultaneously changing VFs spoofcheck and trust there was a call trace in ice_reset_vf that VF's VSI is null. [145237.352797] WARNING: CPU: 46 PID: 840629 at drivers/net/ethernet/intel/ice/ice_vf_lib.c:508 ice_reset_vf+0x3d6/0x410 [ice] [145237.352851] Modules linked in: ice(E) vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio iavf dm_mod xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt iTC O_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl ipmi_si intel_cstate ipmi_devintf joydev intel_uncore m ei_me ipmi_msghandler i2c_i801 pcspkr mei lpc_ich ioatdma i2c_smbus acpi_pad acpi_power_meter ip_tables xfs libcrc32c i2c_algo_bit drm_sh mem_helper drm_kms_helper sd_mod t10_pi crc64_rocksoft syscopyarea crc64 sysfillrect sg sysimgblt fb_sys_fops drm i40e ixgbe ahci libahci libata crc32c_intel mdio dca wmi fuse [last unloaded: ice] [145237.352917] CPU: 46 PID: 840629 Comm: kworker/46:2 Tainted: G S W I E 5.19.0-rc6+ #24 [145237.352921] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [145237.352923] Workqueue: ice ice_service_task [ice] [145237.352948] RIP: 0010:ice_reset_vf+0x3d6/0x410 [ice] [145237.352984] Code: 30 ec f3 cc e9 28 fd ff ff 0f b7 4b 50 48 c7 c2 48 19 9c c0 4c 89 ee 48 c7 c7 30 fe 9e c0 e8 d1 21 9d cc 31 c0 e9 a 9 fe ff ff <0f> 0b b8 ea ff ff ff e9 c1 fc ff ff 0f 0b b8 fb ff ff ff e9 91 fe [145237.352987] RSP: 0018:ffffb453e257fdb8 EFLAGS: 00010246 [145237.352990] RAX: ffff8bd0040181c0 RBX: ffff8be68db8f800 RCX: 0000000000000000 [145237.352991] RDX: 000000000000ffff RSI: 0000000000000000 RDI: ffff8be68db8f800 [145237.352993] RBP: ffff8bd0040181c0 R08: 0000000000001000 R09: ffff8bcfd520e000 [145237.352995] R10: 0000000000000000 R11: 00008417b5ab0bc0 R12: 0000000000000005 [145237.352996] R13: ffff8bcee061c0d0 R14: ffff8bd004019640 R15: 0000000000000000 [145237.352998] FS: 0000000000000000(0000) GS:ffff8be5dfb00000(0000) knlGS:0000000000000000 [145237.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [145237.353002] CR2: 00007fd81f651d68 CR3: 0000001a0fe10001 CR4: 00000000001726e0 [145237.353003] Call Trace: [145237.353008] <TASK> [145237.353011] ice_process_vflr_event+0x8d/0xb0 [ice] [145237.353049] ice_service_task+0x79f/0xef0 [ice] [145237.353074] process_one_work+0x1c8/0x390 [145237.353081] ? process_one_work+0x390/0x390 [145237.353084] worker_thread+0x30/0x360 [145237.353087] ? process_one_work+0x390/0x390 [145237.353090] kthread+0xe8/0x110 [145237.353094] ? kthread_complete_and_exit+0x20/0x20 [145237.353097] ret_from_fork+0x22/0x30 [145237.353103] </TASK> Remove WARN_ON() from check if VSI is null in ice_reset_vf. Add "VF is already removed\n" in dev_dbg(). This WARN_ON() is unnecessary and causes call trace, despite that call trace, driver still works. There is no need for this warn because this piece of code is responsible for disabling VF's Tx/Rx queues when VF is disabled, but when VF is already removed there is no need to do reset or disable queues. Fixes: efe4186 ("ice: Fix memory corruption in VF driver") Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 2fe2a8f ] A null-ptr-deref is triggered when it tries to destroy the workqueue in vkms->output.composer_workq in vkms_release(). KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] CPU: 5 PID: 17193 Comm: modprobe Not tainted 6.0.0-11331-gd465bff130bf #24 RIP: 0010:destroy_workqueue+0x2f/0x710 ... Call Trace: <TASK> ? vkms_config_debugfs_init+0x50/0x50 [vkms] __devm_drm_dev_alloc+0x15a/0x1c0 [drm] vkms_init+0x245/0x1000 [vkms] do_one_initcall+0xd0/0x4f0 do_init_module+0x1a4/0x680 load_module+0x6249/0x7110 __do_sys_finit_module+0x140/0x200 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The reason is that an OOM happened which triggers the destroy of the workqueue, however, the workqueue is alloced in the later process, thus a null-ptr-deref happened. A simple call graph is shown as below: vkms_init() vkms_create() devm_drm_dev_alloc() __devm_drm_dev_alloc() devm_drm_dev_init() devm_add_action_or_reset() devm_add_action() # an error happened devm_drm_dev_init_release() drm_dev_put() kref_put() drm_dev_release() vkms_release() destroy_workqueue() # null-ptr-deref happened vkms_modeset_init() vkms_output_init() vkms_crtc_init() # where the workqueue get allocated Fix this by checking if composer_workq is NULL before passing it to the destroy_workqueue() in vkms_release(). Fixes: 6c234fe ("drm/vkms: Implement CRC debugfs API") Signed-off-by: Yuan Can <yuancan@huawei.com> Reviewed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Melissa Wen <melissa.srw@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221101065156.41584-3-yuancan@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 37c3b9f ] The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 0b0747d upstream. The following processes run into a deadlock. CPU 41 was waiting for CPU 29 to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29 was hung by that spinlock with IRQs disabled. PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd" !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0 !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0 !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0 # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0 # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0 # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0 # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0 # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0 # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0 # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0 #10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0 #11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0 #12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0 #13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0 #14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0 #15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0 #16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0 #17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0 #18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0 #19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 #20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 #21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 #22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 #23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 #24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 #25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 #26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 #27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd" !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0 !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0 # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0 # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0 # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0 # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0 # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0 # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0 # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0 # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 #10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 #11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 #12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 #13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 #14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 #15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 #16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 #17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 The lock is used to synchronize different sysfs operations, it doesn't protect any resource that will be touched by an interrupt. Consequently it's not required to disable IRQs. Replace the spinlock with a mutex to fix the deadlock. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

jackpot51 requested review from a team August 13, 2020 20:19

jackpot51 self-assigned this Aug 13, 2020

jackpot51 mentioned this pull request Aug 14, 2020

Add rtl8125 firmware pop-os/linux-firmware#2

Merged

jackpot51 force-pushed the bonw14_focal branch from d2542ef to 3d085fc Compare August 14, 2020 14:28

jackpot51 changed the title ~~Bonw14 focal~~ bonw14 ethernet Aug 14, 2020

jackpot51 force-pushed the bonw14_focal branch from f5b28e4 to e78f762 Compare August 14, 2020 16:28

jacobgkau approved these changes Aug 18, 2020

View reviewed changes

jackpot51 merged commit e78f762 into master_focal Aug 19, 2020

jackpot51 force-pushed the master_focal branch from 7719dbd to e78f762 Compare August 19, 2020 13:49

jackpot51 deleted the bonw14_focal branch August 19, 2020 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bonw14 ethernet #24

bonw14 ethernet #24

jackpot51 commented Aug 13, 2020 •

edited

Loading

jacobgkau commented Aug 13, 2020

leviport commented Aug 18, 2020

jacobgkau left a comment •

edited

Loading

bonw14 ethernet #24

bonw14 ethernet #24

Conversation

jackpot51 commented Aug 13, 2020 • edited Loading

jacobgkau commented Aug 13, 2020

leviport commented Aug 18, 2020

jacobgkau left a comment • edited Loading

Choose a reason for hiding this comment

jackpot51 commented Aug 13, 2020 •

edited

Loading

jacobgkau left a comment •

edited

Loading