Skip to content

denwong47/async-1brc

Repository files navigation

Personal exercise on 1BRC using Rust

Rust CI

This is a personal exercise to solve the 1BRC challenge using Rust. It may not conform to all the challenge requirements due to differences in equipments availability and personal preferences, not to mention that the original challenge is in Java to begin with.

One of the main premises of this exercise is also to use Async Rust. While there could be pointer magics in unsafe blocking Rust that solves the challenge in an incredibly efficient way, such approaches do not have real world applications, limiting the meaningfulness of any learnings. Since Async Rust had come a long way in 2023, the objective here is to use Async Rust to solve this challenge even though it may incur runtime overheads.

Personal Sub-rules

  • The program should be easily adaptable, if not able, to read from a bytes stream instead of a file.
  • The number of stations, and their identities, should not be known in advance.
  • The program does not need to be error tolerant, i.e. it is allowed to crash on invalid input, but it should be capable of reading any number of valid inputs, not hardcoded to exactly 1 billion rows.
  • RAM is not considered free. The program is not allowed to use unreasonable amounts of memory in name of performance. e.g. reading the file from RAM disk to eliminate I/O performance differences is allowed, but reading the entire file into memory is not, as for in real world applications the file could be of any arbitrary size, exceeding the available memory.
  • The program should be benchmarked on exactly 8 cores, but not hardcoded to such.

Setup requirements

  • GNU Make, Rust, JDK 21, and Maven should be installed.
  • Compile the Java reference implementation by running ./mvnw clean verify in the ../1brc directory.
  • The 1brc repository should be cloned to a parallel directory to this repository, i.e. should be accessible at ../1brc.
    • ./create_measurements.sh 1000000000 should have been run within the 1brc repository, so that ../1brc/measurements.txt should be present.
    • In order to use the --features=assert flag, the ../1brc/out_expected.txt should also be present. This can be generated by the baseline implementation at ./calculate_average_baseline.sh.

Running the program

make run

Current timings

The timings are taken on a M1 Pro 10-core machine, using only 8 threads.

File is read from disk directly; no RAM disk is used.

  • Tokio Buffered Read: ~9.6s
  • Mmap + Rayon: ~7.3s

Feature Flags

  • bench: Print out the amount of time taken to produce the output.
  • debug: Print out debug information; significantly slows down the program.
  • assert: Enables the assertion of the output against the expected output. This is only useful for debugging purposes, and should not be used in production.
  • timed: Print out selected time measurements for debugging purposes.
  • timed-extreme: Print out all time measurements for debugging purposes, including ones that significantly slow down the program by 4 to 5 times.

Note

  • For the purpose of gxhash, -C target-cpu=native will be set by default to enable the use of AES instructions. This however may prevent compiled binaries from being used on other machines.

About

Would tokio do good or do bad against 1BRC?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published