This repository contains prototypes of page allocators. They are designed for multicore, hybrid systems with volatile and non-volatile memory.
The two main design goals are multicore scalability and crash consistency.
Related Projects
- C Implementation: https://github.com/luhsra/llfree-c
- Benchmarks: https://github.com/luhsra/llfree-bench
- Modified Linux: https://github.com/luhsra/llfree-linux
- Benchmark Module: https://github.com/luhsra/linux-alloc-bench
LLFree: Scalable and Optionally-Persistent Page-Frame Allocation
Lars Wrenger, Florian Rommel, Alexander Halbuer, Christian Dietrich, Daniel Lohmann
In: 2023 USENIX Annual Technical Conference (USENIX '23); USENIX Association
To compile and test the allocator, you have to install rust.
The nightly
version 1.76.0
(or newer) is required for inline assembly and custom compiler toolchains.
# Init dependencies
git submodule update --init
# Release build
cargo build -r
# Running unit-tests
cargo test -- --test-threads 1
Note: This project uses certain UNIX features directly (for memory mapping and thread pinning) and doesn't work on Windows without modifications.
The core directory contains the main llfree
crate and all the allocators.
In general, LLFree is separated into a lower allocator, responsible for allocating pages of 4K up to 4M, and an upper allocator, designed to prevent memory sharing and fragmentation.
The persistent lower allocator can be found in lower. Internally, this allocator has 512-bit-large bit fields at the lowest level. They store which 4K pages are allocated. The second level consists of 2M entries, one for each bit field. These entries contain a counter of free pages in the related bit field and a flag if the whole subtree is allocated as a 2M huge page. These 2M entries are further grouped into trees with 32 entries (this can be defined at compile-time).
The llfree module contains the upper allocator. Its interface is defined in lib.rs, along with various unit and stress tests. The upper allocator depends on the lower allocator for the actual allocations and only manages the higher-level trees. Its purpose is to improve performance by preventing memory sharing and fragmentation. It is completely volatile and has to be rebuilt on boot.
The allocator's data structures are defined in core/src/entry.rs and core/src/trees.rs.
This project also includes the C reimplementation (llfree-c) in llc.
The unit tests can be executed for the C by using the --features llc
argument:
# fetch the C implementation
git submodules update --init
# build & test the C implementation
cargo test --features llc -- --test-threads 1
The benchmarks can be found in bench/src/bin and the benchmark evaluation and visualization in the llfree-bench repository.
These benchmarks can be executed with:
cargo perf bench -- bulk -x1 -x2 -x4 -t4 -m8 -o results/bench.csv LLFree
This runs the bulk
benchmark for 1, 2, and 4 threads (-t4
max 4 threads) on 24G DRAM and stores the result in results/bench.csv
.
To execute the benchmark on NVM, use the --dax
flag to specify a DAX file to be mapped.
For more info on the cli arguments run
cargo perf bench -- -h
.The debug output can be suppressed with setting
RUST_LOG=error
.
General statistics:
perf stat -e <events> target/release/bench <args>
Additional details with
-d -d -d
...Also
hotspot
is a great analysis tool for these statistics
Recording events:
perf record -g -F 999 target/release/bench <args>
After conversion perf script -F +pid > test.perf
, this can be opened in firefox: https://profiler.firefox.com/