Much of modern code suffers from common pitfalls: bugs, security vulnerabilities, and performance bottlenecks. University curricula often teach outdated concepts, while bootcamps oversimplify crucial software development principles.
This repository offers practical examples of writing efficient C and C++ code. It leverages C++20 features and is designed primarily for GCC and Clang compilers on Linux, though it may work on other platforms. The topics range from basic micro-kernels executing in a few nanoseconds to more complex constructs involving parallel algorithms, coroutines, and polymorphism. Some of the highlights include:
- 100x cheaper random inputs?! Discover how input generation sometimes costs more than the algorithm.
- 40x faster trigonometric calculations: Achieve significant speed-ups over standard library functions like
std::sin
. - 4x faster logic with
std::ranges
: See how modern C++ abstractions can be surprisingly efficient when used correctly. - Trade-offs between accuracy and efficiency: Explore how to balance precision and performance in numerical computations.
- Compiler optimizations beyond
-O3
: Learn about less obvious flags and techniques to deliver another 2x speedup. - Optimizing matrix multiplications? Learn how a 3x3x3 GEMM can be 60% slower than 4x4x4, despite performing 60% fewer math operations.
- How many if conditions are too many? Test your CPU's branch predictor with just 10 lines of code.
- Iterative vs. recursive algorithms: Avoid pitfalls that could cause a
SEGFAULT
or slow your program. - How not to build state machines: Compare
std::variant
,virtual
functions, and C++20 coroutines.
To read, jump to the less_slow.cpp
source file and read the code snippets and comments.
If you are familiar with C++ and want to review code and measurements as you read, you can clone the repository and execute the following commands.
git clone https://github.com/ashvardanian/less_slow.cpp.git # Clone the repository
cd less_slow.cpp # Change the directory
cmake -B build_release -D CMAKE_BUILD_TYPE=Release # Generate the build files
cmake --build build_release --config Release # Build the project
build_release/less_slow # Run the benchmarks
For brevity, the tutorial is intended for GCC and Clang compilers on Linux. To control the output or run specific benchmarks, use the following flags:
build_release/less_slow --benchmark_format=json # Output in JSON format
build_release/less_slow --benchmark_out=results.json # Save the results to a file, instead of `stdout`
build_release/less_slow --benchmark_filter=std_sort # Run only benchmarks containing `std_sort` in their name
The builds will Google Benchmark and Intel's oneTBB for the Parallel STL implementation.
To enhance stability and reproducibility, use the --benchmark_enable_random_interleaving=true
flag which shuffles and interleaves benchmarks as described here.
build_release/less_slow --benchmark_enable_random_interleaving=true
Google Benchmark supports User-Requested Performance Counters through libpmf
.
Note that collecting these may require sudo
privileges.
sudo build_release/less_slow --benchmark_enable_random_interleaving=true --benchmark_format=json --benchmark_perf_counters="CYCLES,INSTRUCTIONS"
Alternatively, use the Linux perf
tool for performance counter collection:
sudo perf stat taskset 0xEFFFEFFFEFFFEFFFEFFFEFFFEFFFEFFF build_release/less_slow --benchmark_enable_random_interleaving=true --benchmark_filter=super_sort
Many of the examples here are condensed versions of the articles on my "Less Slow" blog and many related repositories on my GitHub profile. If you are also practicing Rust, you may find the "Less Slow Rust" repository interesting.