Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the option to Multithread AtriumDB #119

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

WilliamDixon
Copy link

@WilliamDixon WilliamDixon commented Jan 2, 2025

One of AtriumDB's strengths is its ability to use multiple threads at once to encode and decode blocks of data in parallel.

In order to optimize CPU time (number of CPUs * time spent processing), it was most efficient to run AtriumDB in single threaded mode (avoiding extra CPU time sharing information between cores).

However with the addition of the "Wall Time" metric, I thought it worthwhile to add a new AtriumDB subclass that utilizes its parallelization feature which optimizes Wall Time at the expense of CPU time.

To highlight an example of this tradeoff, below are the times it took for AtriumDB to write an entire Mimic record to disk with and without multithreading on a 40 core Linux server:

SingleThreaded AtriumDB:
CPU time: 212.1006 sec
Wall Time: 215.1170 s

MultiThreaded AtriumDB (40 threads):
CPU time: 435.6397 sec
Wall Time: 27.0486 s

As you can see, CPU time suffers by a factor of 2, while Wall time improves by a factor of 8.

These benefits are most apparent for large reads/writes, and disappear completely when the size of the task drops below 1 AtriumDB Block (whose size can also now be more easily adjusted in the source code of this PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant