Add the option to Multithread AtriumDB #119
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
One of AtriumDB's strengths is its ability to use multiple threads at once to encode and decode blocks of data in parallel.
In order to optimize CPU time (number of CPUs * time spent processing), it was most efficient to run AtriumDB in single threaded mode (avoiding extra CPU time sharing information between cores).
However with the addition of the "Wall Time" metric, I thought it worthwhile to add a new AtriumDB subclass that utilizes its parallelization feature which optimizes Wall Time at the expense of CPU time.
To highlight an example of this tradeoff, below are the times it took for AtriumDB to write an entire Mimic record to disk with and without multithreading on a 40 core Linux server:
As you can see, CPU time suffers by a factor of 2, while Wall time improves by a factor of 8.
These benefits are most apparent for large reads/writes, and disappear completely when the size of the task drops below 1 AtriumDB Block (whose size can also now be more easily adjusted in the source code of this PR).