-
Notifications
You must be signed in to change notification settings - Fork 739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persist AOF file by io_uring #750
base: unstable
Are you sure you want to change the base?
Conversation
Signed-off-by: Wenwen Chen <wenwen.chen@samsung.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #750 +/- ##
============================================
- Coverage 70.40% 70.34% -0.06%
============================================
Files 112 113 +1
Lines 61467 61487 +20
============================================
- Hits 43275 43253 -22
- Misses 18192 18234 +42
|
Signed-off-by: Wenwen Chen <wenwen.chen@samsung.com>
Signed-off-by: Wenwen Chen <wenwen.chen@samsung.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello. Are you working with @lipzhu? If we do the write with io_uring, we could also do the fsync in the same ring without an extra syscall?
29% improved throughput is impressive. I wonder how this can be achieved, because we still wait for the write and then do fsync before we process the next command. I guess it is just doing less syscalls?
Without io_uring we do write
in a while loop. I wonder if the same improved performance could be achieved with writev
instead of the loop. Have you tried that?
No,I am not working with @lipzhu,but I have following #599 for a long time.
Yes, the performance improvemet is brought by the less syscalls of io_uring. |
Echo @zuiderkwast , I am also curious why io_uring could help perf boost on such kind of case. |
OK, I will do these tests ASAP.
I am sorry. I don't know howto disable rewrite process? |
Through config |
Thank you very much. I did some extra experiments
1. Performance comparison
2. CPU utilization comparison
|
With io_uring, the kernel can use kernel threads? Maybe that's why it's faster but uses more CPU? Are these cycles and instructions numbers for the full benchmark or for a fixed duration like one second? With higher throughput, we can handle more traffic. It's OK to use more CPU for more traffic, but for the same traffic I hope we don't use very much more CPU. |
It is determined by IO traffic.
Yes.
For a fixed duration (10S). |
@lipzhu |
Hi @Wenwen-Chen , sorry for the late response, per my understanding, io_uring should not have benefit for such kind of case, your result of
This is also my question, can you help to do some analysis of why |
c26c4f2
to
0c6ed8c
Compare
Signed-off-by: Wenwen Chen <wenwen.chen@samsung.com>
Signed-off-by: Wenwen Chen <wenwen.chen@samsung.com>
|
io_uring doesn't bring perfromance improvment on ‘disable Rewrite’ scenario when compared with write SYSCALL ( io_uring: 60336.46 vs write: 61722.51, #750 (comment)). |
I'm completely mediocre on the io_uring internals and fsync as well. But did you try setting affinity for background processes and AOF rewrite while doing your benchmark? Will it get the same boost with correctly configured affinity (I mean different physical cores for the main and background threads, not virtual ones like HT or SMT)? # server-cpulist 0-7:2 |
@Wenwen-Chen do you plan to work on this? |
Hi, @xbasel I really want to promote this patch, However, I am not expert of Valkey. I have not found the root reason why io_uring enbale + rewrite enable reduces the time of fdatasync. Do you have any suggestion? |
Hi @egbaydarov, thank you very much for your suggestion. |
Description
Persisting write commands into AOF file is a method of Valkey to ensure high reliability. When user turn on AOF and set the appendfsync always, the speed of writing data into disk is critical to Valkey. Due to the write operation is synchronous and Valkey server will not response to other requests of Valkey clients.
IO_Uring is a powerful asynchronous I/O API for Linux. This patch optimize Valkey's performance by replace traditional write interface by io_uring when persist AOF file to disk.
We tested the performance by Valkey-benchmark tool. The patch improves perfromance by 29.24%.
Baseline: 48,847.20 Qps -> Optimized: 63,130.57 Qps
Test Environment
OPERATING SYSTEM: Ubuntu
Kernel: 6.5.0
DISK: SATA SSD
PROCESSOR: Intel(R) Xeon(R) Gold 6152 CPU (total 88 Threads, 2 Sockets, 22 Cores per socket, 2 Threads per Core)
NUMA info of the processor
NUMA node(s): 2
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,...,86
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,...,87
Base: #741
Server and Valkey-benchmark in same socket.
Server config
port 9876
bind 127.0.0.1
appendonly yes
appendfsync always
no-appendfsync-on-rewrite no
aof-use-rdb-preamble no
daemonize no
protected-mode no
databases 16
latency-monitor-threshold 1
repl-diskless-sync-delay 0
save
io-uring-enalbed yes
Test step
For both single thread and multiple threads, I tested each case 3 times. The average performance are summaried as follow table: