STI DRM kernel panic with glmark2-drm #11

waby38b · 2025-01-06T22:02:51Z

Linux 4kOpen 6.12.6 #1 SMP Sat Dec 21 16:26:54 CET 2024 armv7l GNU/Linux /
bootargs= (...) cma=256m
glmark2 2023.01

glmark2-es2-wayland / weston

=> 2560x1600: OK
=> 1920x1080: OK
=> 2560x1440: OK

glmark2-es2-drm / DRM

=> 2560x1600: OK
=> 1920x1080: OK
=> 2560x1440: Dead !

Here kernel backstrace (loop)

(...)
[  155.987679] Division by zero in kernel.
[  155.994000] Division by zero in kernel.
[  156.000354] Division by zero in kernel.
[  156.006625] Division by zero in kernel.
(...)
Jan  6 22:23:11 4kOpen kern.err kernel: [  294.792802] Division by zero in kernel.
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.796644] CPU: 3 UID: 0 PID: 317 Comm: glmark2-es2-drm Tainted: G        W          6.12.6 #1
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.796677] Tainted: [W]=WARN
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.796688] Hardware name: STi SoC with Flattened Device Tree
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.796702] Call trace: 
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.796712]  unwind_backtrace from show_stack+0x10/0x14
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.796754]  show_stack from dump_stack_lvl+0x54/0x68
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.796784]  dump_stack_lvl from Ldiv0_64+0x8/0x18
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.796827]  Ldiv0_64 from div_u64_rem+0x30/0x44
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.796880]  div_u64_rem from div64_u64+0x28/0xcc
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.796923]  div64_u64 from clk_fs660c32_dig_get_params+0x1f0/0x3ac
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.796966]  clk_fs660c32_dig_get_params from quadfs_round_rate+0x4c/0x8c
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.796998]  quadfs_round_rate from clk_core_determine_round_nolock+0x98/0xd0
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.797030]  clk_core_determine_round_nolock from clk_calc_new_rates+0xa0/0x2b0
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.797069]  clk_calc_new_rates from clk_calc_new_rates+0x134/0x2b0
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.797106]  clk_calc_new_rates from clk_core_set_rate_nolock+0xdc/0x324
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.797148]  clk_core_set_rate_nolock from clk_set_rate+0x30/0x154
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.797190]  clk_set_rate from sti_hdmi_set_mode+0x5c/0x90 [sti_drm]
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.797349]  sti_hdmi_set_mode [sti_drm] from drm_bridge_chain_mode_set+0x40/0x54 [drm]
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.798030]  drm_bridge_chain_mode_set [drm] from crtc_set_mode+0x190/0x1cc [drm_kms_helper]
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.798774]  crtc_set_mode [drm_kms_helper] from drm_atomic_helper_commit_tail+0x38/0x8c [drm_kms_helper]
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.799070]  drm_atomic_helper_commit_tail [drm_kms_helper] from commit_tail+0x15c/0x180 [drm_kms_helper]
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.799337]  commit_tail [drm_kms_helper] from drm_atomic_helper_commit+0x138/0x15c [drm_kms_helper]
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.799599]  drm_atomic_helper_commit [drm_kms_helper] from drm_atomic_commit+0xc0/0xf4 [drm]
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.800291]  drm_atomic_commit [drm] from drm_atomic_helper_set_config+0x88/0xc0 [drm_kms_helper]
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.801011]  drm_atomic_helper_set_config [drm_kms_helper] from drm_mode_setcrtc+0x37c/0x848 [drm]
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.801724]  drm_mode_setcrtc [drm] from drm_ioctl+0x2a0/0x490 [drm]
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.802746]  drm_ioctl [drm] from sys_ioctl+0x2d8/0xd10
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.803290]  sys_ioctl from ret_fast_syscall+0x0/0x54
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.803328] Exception stack(0xf1005fa8 to 0xf1005ff0)
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.803352] 5fa0:                   00000003 bee14960 00000003 c06864a2 bee14960 bee14940
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.803376] 5fc0: 00000003 bee14960 c06864a2 00000036 0059adac 0058e120 bee14a4c 004b617c
Jan  6 22:23:11 4kOpen kern.warn kernel: [  294.803395] 5fe0: b6f16ef4 bee14930 b6f0ac70 b6b900e4

The text was updated successfully, but these errors were encountered:

waby38b · 2025-01-06T22:41:26Z

# modetest -M  sti
Encoders:
id	crtc	type	possible crtcs	possible clones	
52	35	TMDS	0x00000003	0x00000007
53	0	DAC	0x00000003	0x00000007
54	0	LVDS	0x00000003	0x00000007

Connectors:
id	encoder	status		name		size (mm)	modes	encoders
55	52	connected	HDMI-A-1       	600x330		100	52
  modes:
	index name refresh (Hz) hdisp hss hse htot vdisp vss vse vtot
  #0 2560x1440 59.97 2560 2608 2640 2720 1440 1443 1448 1481 241570 flags: phsync, pvsync; type: preferred, driver
  #1 3840x2160 60.00 3840 4016 4104 4400 2160 2168 2178 2250 594000 flags: phsync, pvsync; type: driver
  #2 3840x2160 59.94 3840 4016 4104 4400 2160 2168 2178 2250 593407 flags: phsync, pvsync; type: driver
  #3 3840x2160 50.00 3840 4896 4984 5280 2160 2168 2178 2250 594000 flags: phsync, pvsync; type: driver
  #4 3840x2160 30.00 3840 4016 4104 4400 2160 2168 2178 2250 297000 flags: phsync, pvsync; type: driver

1 -
# modetest -M sti -s 55@35:2560x1440-59.97 -d setting mode 2560x1440-59.97Hz on connectors 55, crtc 35
=> no crash here ! (picture ok)

2 -
# modetest -M sti -s 55@35:1920x1080-60 -d setting mode 2560x1440-59.97Hz on connectors 55, crtc 35
=> no crash here ! (picture ok)

3 -
# modetest -M sti -s 55@35:2560x1440-59.97 -d setting mode 2560x1440-59.97Hz on connectors 55, crtc 35
=> nothing on screen....

# dmesg 
[  567.138010] sti-hdmi a804000.sti-hdmi: sti_hdmi_tx6g0c28phy_start: hdmi phy pll not locked
[  567.146327] [drm:drm_atomic_bridge_chain_pre_enable [drm]] *ERROR* Unable to start hdmi phy

4 -

# modetest -M sti -s 55@35:3840x2160-60 -d 
setting mode 3840x2160-60.00Hz on connectors 55, crtc 35

[  325.195200] Division by zero in kernel.
[  325.199026] CPU: 3 UID: 0 PID: 316 Comm: modetest Tainted: G        W          6.12.6 #1
[  325.199039] Tainted: [W]=WARN
[  325.199044] Hardware name: STi SoC with Flattened Device Tree
[  325.199050] Call trace: 
[  325.199055]  unwind_backtrace from show_stack+0x10/0x14
[  325.199072]  show_stack from dump_stack_lvl+0x54/0x68
[  325.199085]  dump_stack_lvl from Ldiv0_64+0x8/0x18
[  325.199103]  Ldiv0_64 from div_u64_rem+0x30/0x44
[  325.199125]  div_u64_rem from div64_u64+0x28/0xcc
[  325.199144]  div64_u64 from clk_fs660c32_dig_get_params+0x1f0/0x3ac
[  325.199161]  clk_fs660c32_dig_get_params from quadfs_round_rate+0x4c/0x8c
[  325.199175]  quadfs_round_rate from clk_core_determine_round_nolock+0x98/0xd0
[  325.199189]  clk_core_determine_round_nolock from clk_calc_new_rates+0xa0/0x2b0
[  325.199204]  clk_calc_new_rates from clk_calc_new_rates+0x134/0x2b0
[  325.199221]  clk_calc_new_rates from clk_core_set_rate_nolock+0xdc/0x324
[  325.199238]  clk_core_set_rate_nolock from clk_set_rate+0x30/0x154
[  325.199256]  clk_set_rate from sti_hdmi_set_mode+0x5c/0x90 [sti_drm]
[  325.199326]  sti_hdmi_set_mode [sti_drm] from drm_bridge_chain_mode_set+0x40/0x54 [drm]
[  325.199609]  drm_bridge_chain_mode_set [drm] from crtc_set_mode+0x190/0x1cc [drm_kms_helper]
[  325.199921]  crtc_set_mode [drm_kms_helper] from drm_atomic_helper_commit_tail+0x38/0x8c [drm_kms_helper]
[  325.200047]  drm_atomic_helper_commit_tail [drm_kms_helper] from commit_tail+0x15c/0x180 [drm_kms_helper]
[  325.200159]  commit_tail [drm_kms_helper] from drm_atomic_helper_commit+0x138/0x15c [drm_kms_helper]
[  325.200275]  drm_atomic_helper_commit [drm_kms_helper] from drm_atomic_commit+0xc0/0xf4 [drm]
[  325.200574]  drm_atomic_commit [drm] from drm_client_modeset_commit_atomic+0x208/0x24c [drm]
[  325.201008]  drm_client_modeset_commit_atomic [drm] from drm_client_modeset_commit_locked+0x5c/0x184 [drm]
[  325.201439]  drm_client_modeset_commit_locked [drm] from drm_client_modeset_commit+0x24/0x40 [drm]
[  325.201871]  drm_client_modeset_commit [drm] from drm_fb_helper_lastclose+0x48/0x74 [drm_kms_helper]
[  325.202177]  drm_fb_helper_lastclose [drm_kms_helper] from drm_fbdev_dma_client_restore+0xc/0x14 [drm_dma_helper]
[  325.202262]  drm_fbdev_dma_client_restore [drm_dma_helper] from drm_client_dev_restore+0x6c/0xc0 [drm]
[  325.202499]  drm_client_dev_restore [drm] from drm_release+0x120/0x124 [drm]
[  325.202931]  drm_release [drm] from __fput+0xcc/0x2d0
[  325.203156]  __fput from task_work_run+0x8c/0xbc
[  325.203171]  task_work_run from do_exit+0x320/0x96c
[  325.203192]  do_exit from do_group_exit+0x34/0x84
[  325.203212]  do_group_exit from get_signal+0x95c/0x9a4
[  325.203229]  get_signal from do_work_pending+0xf8/0x4b8
[  325.203240]  do_work_pending from slow_work_pending+0xc/0x24
[  325.203251] Exception stack(0xf0a3dfb0 to 0xf0a3dff8)
[  325.203261] dfa0:                                     00000000 c06864a2 becf3900 becf38e0
[  325.203271] dfc0: 00000003 becf3900 c06864a2 00000036 0041b1fc 00419220 00000001 00000000
[  325.203280] dfe0: b6f22ef4 becf38d0 b6f16c70 b6e450e4 40000010 00000003
[  325.211326] sti-hdmi a804000.sti-hdmi: sti_hdmi_tx6g0c28phy_start: hdmi phy pll not locked
[  325.219595] [drm:drm_atomic_bridge_chain_pre_enable [drm]] *ERROR* Unable to start hdmi phy

During the update procedure, when overwrite element in a pre-allocated htab, the freeing of old_element is protected by the bucket lock. The reason why the bucket lock is necessary is that the old_element has already been stashed in htab->extra_elems after alloc_htab_elem() returns. If freeing the old_element after the bucket lock is unlocked, the stashed element may be reused by concurrent update procedure and the freeing of old_element will run concurrently with the reuse of the old_element. However, the invocation of check_and_free_fields() may acquire a spin-lock which violates the lockdep rule because its caller has already held a raw-spin-lock (bucket lock). The following warning will be reported when such race happens: BUG: scheduling while atomic: test_progs/676/0x00000003 3 locks held by test_progs/676: #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830 #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500 #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0 Modules linked in: bpf_testmod(O) Preemption disabled at: [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500 CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ #11 Tainted: [W]=WARN, [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)... Call Trace: <TASK> dump_stack_lvl+0x57/0x70 dump_stack+0x10/0x20 __schedule_bug+0x120/0x170 __schedule+0x300c/0x4800 schedule_rtlock+0x37/0x60 rtlock_slowlock_locked+0x6d9/0x54c0 rt_spin_lock+0x168/0x230 hrtimer_cancel_wait_running+0xe9/0x1b0 hrtimer_cancel+0x24/0x30 bpf_timer_delete_work+0x1d/0x40 bpf_timer_cancel_and_free+0x5e/0x80 bpf_obj_free_fields+0x262/0x4a0 check_and_free_fields+0x1d0/0x280 htab_map_update_elem+0x7fc/0x1500 bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43 bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e bpf_prog_test_run_syscall+0x322/0x830 __sys_bpf+0x135d/0x3ca0 __x64_sys_bpf+0x75/0xb0 x64_sys_call+0x1b5/0xa10 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ... </TASK> It seems feasible to break the reuse and refill of per-cpu extra_elems into two independent parts: reuse the per-cpu extra_elems with bucket lock being held and refill the old_element as per-cpu extra_elems after the bucket lock is unlocked. However, it will make the concurrent overwrite procedures on the same CPU return unexpected -E2BIG error when the map is full. Therefore, the patch fixes the lock problem by breaking the cancelling of bpf_timer into two steps for PREEMPT_RT: 1) use hrtimer_try_to_cancel() and check its return value 2) if the timer is running, use hrtimer_cancel() through a kworker to cancel it again Considering that the current implementation of hrtimer_cancel() will try to acquire a being held softirq_expiry_lock when the current timer is running, these steps above are reasonable. However, it also has downside. When the timer is running, the cancelling of the timer is delayed when releasing the last map uref. The delay is also fixable (e.g., break the cancelling of bpf timer into two parts: one part in locked scope, another one in unlocked scope), it can be revised later if necessary. It is a bit hard to decide the right fix tag. One reason is that the problem depends on PREEMPT_RT which is enabled in v6.12. Considering the softirq_expiry_lock lock exists since v5.4 and bpf_timer is introduced in v5.15, the bpf_timer commit is used in the fixes tag and an extra depends-on tag is added to state the dependency on PREEMPT_RT. Fixes: b00628b ("bpf: Introduce bpf timers.") Depends-on: v6.12+ with PREEMPT_RT enabled Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Closes: https://lore.kernel.org/bpf/20241106084527.4gPrMnHt@linutronix.de Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://lore.kernel.org/r/20250117101816.2101857-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libtraceevent parses and returns an array of argument fields, sometimes larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr", idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6 elements max, creating an out-of-bounds access. This runtime error is found by UBsan. The error message: $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1 builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]' #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966 #1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110 #2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436 #3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897 #4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335 #5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502 #6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351 #7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404 #8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448 #9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556 #10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360 torvalds#12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6) 0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1) = 1 Fixes: 5e58fcf ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint") Signed-off-by: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250122025519.361873-1-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STI DRM kernel panic with glmark2-drm #11

STI DRM kernel panic with glmark2-drm #11

waby38b commented Jan 6, 2025

waby38b commented Jan 6, 2025

STI DRM kernel panic with glmark2-drm #11

STI DRM kernel panic with glmark2-drm #11

Comments

waby38b commented Jan 6, 2025

waby38b commented Jan 6, 2025