-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lkl: Direct irq and fix direct syscall degration #257
Conversation
Direct irq needs this function to handle idle loop by itself Signed-off-by: Yuan Liu <liuyuan@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I think we can remove the test_atomic_... function and use test_bit, see the inline comments.
return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1))); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_bit is already non-atomic in our case, see asm-generic/bitops/atomic.h
Thanks for the quick review. I was thinking to avoid future change of test_bit breaking LKL but it's fine to me to keep it clean. Removed the test_bit... |
There are two major issues in current direct syscall implementation: 1. When there is already a thread pending in syscall, direct syscall degrages to wakeup idle and the performance is bad in that case. This is actually common in applications that have a trafficless control connection. 2. IRQ is not direct if LKL is in idle. Both issuses are actually because of the same thing: LKL can't reschedule when LKL is in idle. The patch adds such support. There are two downside of this patch: 1. need to change kernel/sched/idle.c to expose cpu_idle_loop. 2. lkl_idle_tail_schedule must be in sync with idle.c Those downsides seem OK given the performance we achieve from this patch. For common case, it saves one context switch (direct irq) and I can observe 10% TCP_RR improvement on my desktop. Signed-off-by: Yuan Liu <liuyuan@google.com>
hello @liuyuan10 before this patch, the idle thread had |
Add a new test case which performs double query of the bpf_mprog through libbpf API, but also via raw bpf(2) syscall. This is testing to gather first the count and then in a subsequent probe the full information with the program array without clearing passed structs in between. # ./vmtest.sh -- ./test_progs -t tc_opts [...] ./test_progs -t tc_opts [ 1.398818] tsc: Refined TSC clocksource calibration: 3407.999 MHz [ 1.400263] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fd336761, max_idle_ns: 440795243819 ns [ 1.402734] clocksource: Switched to clocksource tsc [ 1.426639] bpf_testmod: loading out-of-tree module taints kernel. [ 1.428112] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel lkl#252 tc_opts_after:OK lkl#253 tc_opts_append:OK lkl#254 tc_opts_basic:OK lkl#255 tc_opts_before:OK lkl#256 tc_opts_chain_classic:OK lkl#257 tc_opts_chain_mixed:OK lkl#258 tc_opts_delete_empty:OK lkl#259 tc_opts_demixed:OK lkl#260 tc_opts_detach:OK lkl#261 tc_opts_detach_after:OK lkl#262 tc_opts_detach_before:OK lkl#263 tc_opts_dev_cleanup:OK lkl#264 tc_opts_invalid:OK lkl#265 tc_opts_max:OK lkl#266 tc_opts_mixed:OK lkl#267 tc_opts_prepend:OK lkl#268 tc_opts_query:OK <--- (new test) lkl#269 tc_opts_replace:OK lkl#270 tc_opts_revision:OK Summary: 19/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-4-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add a new test case to query on an empty bpf_mprog and pass the revision directly into expected_revision for attachment to assert that this does succeed. ./test_progs -t tc_opts [ 1.406778] tsc: Refined TSC clocksource calibration: 3407.990 MHz [ 1.408863] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcaf6eb0, max_idle_ns: 440795321766 ns [ 1.412419] clocksource: Switched to clocksource tsc [ 1.428671] bpf_testmod: loading out-of-tree module taints kernel. [ 1.430260] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel lkl#252 tc_opts_after:OK lkl#253 tc_opts_append:OK lkl#254 tc_opts_basic:OK lkl#255 tc_opts_before:OK lkl#256 tc_opts_chain_classic:OK lkl#257 tc_opts_chain_mixed:OK lkl#258 tc_opts_delete_empty:OK lkl#259 tc_opts_demixed:OK lkl#260 tc_opts_detach:OK lkl#261 tc_opts_detach_after:OK lkl#262 tc_opts_detach_before:OK lkl#263 tc_opts_dev_cleanup:OK lkl#264 tc_opts_invalid:OK lkl#265 tc_opts_max:OK lkl#266 tc_opts_mixed:OK lkl#267 tc_opts_prepend:OK lkl#268 tc_opts_query:OK lkl#269 tc_opts_query_attach:OK <--- (new test) lkl#270 tc_opts_replace:OK lkl#271 tc_opts_revision:OK Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-6-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
There is two major issues in current direct syscall implementation:
degrages to wakeup idle and the performance is bad in that case. This is
actually common in applications that have a trafficless control
connection.
Both issuses are actually because of the same thing: LKL can't
reschedule when LKL is in idle. The patch adds such support.
There are two downside of this patch:
There downsides seem OK given the performance we achieve from this
patch.
For common case, it saves one context switch (direct irq) and I can
observe 10% TCP_RR improvement on my desktop.
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)