Skip to content

Learning how to write "Less Slow" code in C++20, from numerical micro-kernels and SIMD to coroutines, ranges, and polymorphic state machines

Notifications You must be signed in to change notification settings

ashvardanian/less_slow.cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Less Slow C++

Much of modern code suffers from common pitfalls: bugs, security vulnerabilities, and performance bottlenecks. University curricula often teach outdated concepts, while bootcamps oversimplify crucial software development principles.

Less Slow C++

This repository offers practical examples of writing efficient C and C++ code. It leverages C++20 features and is designed primarily for GCC and Clang compilers on Linux, though it may work on other platforms. The topics range from basic micro-kernels executing in a few nanoseconds to more complex constructs involving parallel algorithms, coroutines, and polymorphism. Some of the highlights include:

  • 100x cheaper random inputs?! Discover how input generation sometimes costs more than the algorithm.
  • 40x faster trigonometric calculations: Achieve significant speed-ups over standard library functions like std::sin.
  • 4x faster logic with std::ranges: See how modern C++ abstractions can be surprisingly efficient when used correctly.
  • Trade-offs between accuracy and efficiency: Explore how to balance precision and performance in numerical computations.
  • Compiler optimizations beyond -O3: Learn about less obvious flags and techniques to deliver another 2x speedup.
  • Optimizing matrix multiplications? Learn how a 3x3x3 GEMM can be 60% slower than 4x4x4, despite performing 60% fewer math operations.
  • How many if conditions are too many? Test your CPU's branch predictor with just 10 lines of code.
  • Iterative vs. recursive algorithms: Avoid pitfalls that could cause a SEGFAULT or slow your program.
  • How not to build state machines: Compare std::variant, virtual functions, and C++20 coroutines.

To read, jump to the less_slow.cpp source file and read the code snippets and comments.

Reproducing the Benchmarks

If you are familiar with C++ and want to review code and measurements as you read, you can clone the repository and execute the following commands.

git clone https://github.com/ashvardanian/less_slow.cpp.git # Clone the repository
cd less_slow.cpp                                            # Change the directory
cmake -B build_release -D CMAKE_BUILD_TYPE=Release          # Generate the build files
cmake --build build_release --config Release                # Build the project
build_release/less_slow                                     # Run the benchmarks

For brevity, the tutorial is intended for GCC and Clang compilers on Linux. To control the output or run specific benchmarks, use the following flags:

build_release/less_slow --benchmark_format=json             # Output in JSON format
build_release/less_slow --benchmark_out=results.json        # Save the results to a file, instead of `stdout`
build_release/less_slow --benchmark_filter=std_sort         # Run only benchmarks containing `std_sort` in their name

The builds will Google Benchmark and Intel's oneTBB for the Parallel STL implementation.

To enhance stability and reproducibility, use the --benchmark_enable_random_interleaving=true flag which shuffles and interleaves benchmarks as described here.

build_release/less_slow --benchmark_enable_random_interleaving=true

Google Benchmark supports User-Requested Performance Counters through libpmf. Note that collecting these may require sudo privileges.

sudo build_release/less_slow --benchmark_enable_random_interleaving=true --benchmark_format=json --benchmark_perf_counters="CYCLES,INSTRUCTIONS"

Alternatively, use the Linux perf tool for performance counter collection:

sudo perf stat taskset 0xEFFFEFFFEFFFEFFFEFFFEFFFEFFFEFFF build_release/less_slow --benchmark_enable_random_interleaving=true --benchmark_filter=super_sort

Further Reading

Many of the examples here are condensed versions of the articles on my "Less Slow" blog and many related repositories on my GitHub profile. If you are also practicing Rust, you may find the "Less Slow Rust" repository interesting.