diff --git a/README.md b/README.md index b79ebed..85d813a 100644 --- a/README.md +++ b/README.md @@ -5,3 +5,7 @@ Extract some insightful technical points from the book. # Table of Contents [How can the scalability of MySQL be improved for BenchmarkSQL TPC-C testing?](scalability.md) + +[How to explain why Repeatable Read surprisingly outperforms Read Committed?](isolation.md) + +[The Significant Differences Between BenchmarkSQL and SysBench](sysbench_vs_benchmarksql.md) diff --git a/images/image-20240829081950365.png b/images/image-20240829081950365.png new file mode 100644 index 0000000..af72fe8 Binary files /dev/null and b/images/image-20240829081950365.png differ diff --git a/images/image-20240829092940314.png b/images/image-20240829092940314.png new file mode 100644 index 0000000..edd644e Binary files /dev/null and b/images/image-20240829092940314.png differ diff --git a/images/image-20240829092959775.png b/images/image-20240829092959775.png new file mode 100644 index 0000000..6c623d9 Binary files /dev/null and b/images/image-20240829092959775.png differ diff --git a/images/image-20240829151823981.png b/images/image-20240829151823981.png new file mode 100644 index 0000000..65ecded Binary files /dev/null and b/images/image-20240829151823981.png differ diff --git a/isolation.md b/isolation.md new file mode 100644 index 0000000..67a518d --- /dev/null +++ b/isolation.md @@ -0,0 +1,32 @@ +## How to explain why Repeatable Read surprisingly outperforms Read Committed? + +Transaction isolation is fundamental to database processing, represented by the 'I' in the ACID acronym. The isolation level determines the balance between performance and the reliability, consistency, and predictability of results when multiple transactions concurrently make changes and queries. Commonly used isolation levels are Read Committed, Repeatable Read, and Serializable. By default, InnoDB uses Repeatable Read. + +InnoDB employs distinct locking strategies for each isolation level, impacting query locking behavior under concurrent conditions. Depending on the isolation level, queries may need to wait for locks currently held by other sessions before execution begins [1]. There's a common perception that stricter isolation levels can degrade performance. How does MySQL perform in practical scenarios? + +Tests were conducted across Serializable, Repeatable Read (RR), and Read Committed (RC) isolation levels using two benchmark types: SysBench uniform and pareto tests. The SysBench uniform test simulates low-conflict scenarios, while the SysBench pareto test models high-conflict situations. Due to excessive deadlock logs generated during the SysBench pareto test, which significantly interfered with performance analysis, these logs were suppressed by modifying the source code to ensure fair testing conditions. Moreover, the MySQL testing program utilized a modified version for accuracy, rather than the original version. + +The figure below presents results from the SysBench uniform test, where concurrency increases from 50 to 800 in doubling increments. Given the few conflicts in this test type, there is little variation in throughput among the three transaction isolation levels at low concurrency levels. However, beyond 400 concurrency, the throughput of the Serializable isolation level exhibits a notable decline. + +image-20240829151823981 + +Figure 2-8. SysBench read-write performance comparison with low conflicts under different isolation levels. + +Below 400 concurrency, the differences are minor because of fewer conflicts in the uniform test. With fewer conflicts, the impact of lock strategies under different transaction isolation levels is reduced. However, Read Committed is mainly constrained by frequent acquisition of MVCC ReadView, resulting in performance inferior to Repeatable Read. + +Continuing with the SysBench test under pareto distribution conditions, specific comparative test results can be seen in the following figure. + +image-20240829081950365 + +Figure 2-9. SysBench read-write performance comparison with high conflicts under different isolation levels. + +The figure clearly illustrates that in scenarios with significant conflicts, performance differences due to lock strategies under different transaction isolation levels are pronounced. As anticipated, higher transaction isolation levels generally exhibit lower throughput, particularly under severe conflict conditions. + +In scenarios with few conflicts, performance is primarily constrained by the overhead of acquiring ReadView in MVCC. This is because, under the Read Committed isolation level, MySQL must copy the entire active transaction list each time it reads from the global active transaction list, whereas under Repeatable Read, it only needs to obtain a copy of the active transaction list at the start of the transaction. + +In summary, in low-conflict tests like SysBench uniform, the overhead of MVCC ReadView is the predominant bottleneck, outweighing lock overhead. Consequently, Repeatable Read performs better than Read Committed. Conversely, in high-conflict tests like SysBench pareto, lock overhead becomes the primary bottleneck, resulting in Read Committed outperforming Repeatable Read. + +## References: + +1. https://dev.mysql.com/doc/refman/8.0/en/. +2. Bin Wang (2024). The Art of Problem-Solving in Software Engineering:How to Make MySQL Better. \ No newline at end of file diff --git a/sysbench_vs_benchmarksql.md b/sysbench_vs_benchmarksql.md new file mode 100644 index 0000000..58b3478 --- /dev/null +++ b/sysbench_vs_benchmarksql.md @@ -0,0 +1,36 @@ +# The Significant Differences Between BenchmarkSQL and SysBench + +Using the case of optimizing lock-sys as an example, this section evaluates the significant differences between the SysBench tool and BenchmarkSQL in MySQL performance testing [1]. + +First, use SysBench's standard read/write tests to evaluate the optimization of lock-sys. + +image-20240829092940314 + +Figure 5-21. Comparison of SysBench read-write tests before and after lock-sys optimization. + +From the figure, it can be observed that after optimization, the overall performance of the SysBench tests has actually decreased. + +Next, using BenchmarkSQL to test this optimization, the results are shown in the following figure. + +image-20240829092959775 + +Figure 5-22. Comparison of BenchmarkSQL tests before and after lock-sys optimization. + +From the figure, it can be seen that the results of BenchmarkSQL's TPC-C test indicate that the lock-sys optimization is effective. Why does such a significant difference occur? Let's analyze the differences in characteristics between these testing tools to understand why their tests differ. + +SysBench RW testing is characterized by its speed and simplicity with SQL queries. Under the same concurrency conditions, SysBench typically handles fewer concurrent transactions compared to BenchmarkSQL. Therefore, in the face of latch queue bottlenecks like lock-sys, high concurrency in SysBench may equate to low concurrency in BenchmarkSQL. Consequently, lock-sys optimizations may not have a significant impact in scenarios where BenchmarkSQL operates at lower concurrency levels. + +BenchmarkSQL, a widely used TPC-C testing tool, distributes user threads more evenly across various modules, reducing susceptibility to aggregation effects. In high-concurrency situations, optimizing lock-sys can significantly reduce latch conflicts and minimize impact on other queues, thereby improving throughput. BenchmarkSQL's TPC-C testing is better suited for uncovering deeper concurrency problems in MySQL compared to SysBench. + +This analysis uses deductive reasoning to explore the differences between SysBench and BenchmarkSQL. It demonstrates that poor performance in SysBench tests does not necessarily indicate poor performance in production environments, and vice versa. This discrepancy arises because SysBench test environments often differ significantly from real-world production environments. Consequently, SysBench test results should be used for scenario-specific performance comparisons rather than as comprehensive indicators of production capabilities. + +It is worth noting that the main basis for performance testing and comparison in this book, mainly based on TPC-C, is as follows [2]: + +*TPC benchmark C also known as TPC-C which is the leading online transaction processing (OLTP) benchmark has been used to perform the comparison.* + + + +## References: + +1. Bin Wang (2024). The Art of Problem-Solving in Software Engineering:How to Make MySQL Better. +2. R. N. Avula and C. Zou. Performance evaluation of TPC-C benchmark on various cloud providers, Proc. 11th IEEE Annu. Ubiquitous Comput. Electron. Mobile Commun. Conf. (UEMCON), pp. 226-233, Oct. 2020. \ No newline at end of file