-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntuple] Add MT index building #16679
base: master
Are you sure you want to change the base?
Conversation
Building a separate index for each RNTuple cluster (or any other partitioning scheme) opens up the possibility for multithreaded building and probing. Note that the choice to partition on cluster boundaries might not be the optimal choice performance-wise and might be changed in the future following further evaluation.
Test Results 17 files 17 suites 4d 2h 3m 25s ⏱️ Results for commit 8cce410. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments inline. I think the current implementation with partitions means the built index is less performant (on probe) than before, which is not ideal. Did you consider merging the partial indices at the end to preserve constant complexity?
const std::vector<NTupleSize_t> *GetAllEntryNumbers(const std::vector<void *> &valuePtrs) const; | ||
const std::vector<NTupleSize_t> GetAllEntryNumbers(const std::vector<void *> &valuePtrs) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this interface forces the allocation of a std::vector
including its heap-backed data for every probe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, returning a const
value is generally not what you want (as it is my understanding it may prevent move semantics/rvalue reference semantics, and it doesn't really give any benefit to the caller).
To avoid forcing the allocation on the caller while giving them ownership of the data it's probably better to pass the out vector as a reference parameter (so they may reuse it across calls etc).
RNTupleIndexPartition(const RClusterDescriptor &descriptor, | ||
const std::vector<std::unique_ptr<RFieldBase>> &indexFields, const RPageSource &pageSource) | ||
: fPageSource(pageSource.Clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this constructor need inlining? If not I think it should be defined in the source file
auto clonedField = field->Clone(field->GetFieldName()); | ||
CallConnectPageSourceOnField(*clonedField, *fPageSource); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we need to clone and connect page sources and fields for every index partition / cluster, which means we probably cannot use cluster prefetching...
ROOT::Experimental::Internal::RNTupleIndex::GetAllEntryNumbers(const std::vector<void *> &valuePtrs) const | ||
{ | ||
if (valuePtrs.size() != fIndexFields.size()) | ||
throw RException(R__FAIL("Number of value pointers must match number of indexed fields.")); | ||
|
||
EnsureBuilt(); | ||
|
||
std::vector<NTupleIndexValue_t> indexValues; | ||
indexValues.reserve(fIndexFields.size()); | ||
std::vector<std::vector<NTupleIndexValue_t>> entryNumbersPerCluster; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation uses two vectors (entryNumbersPerCluster
and entryNumbers
) which require two heap allocations. Would it be possible to fill entryNumbers
directly?
for (const auto &indexPartition : fIndexPartitions) { | ||
auto clusterEntryNumbers = indexPartition.fIndex.find(indexValue); | ||
|
||
if (clusterEntryNumbers == indexPartition.fIndex.end()) | ||
continue; | ||
|
||
entryNumbersPerCluster.push_back(clusterEntryNumbers->second); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this introduces linear complexity in the number of clusters: while fIndex.find()
has constant complexity, now there are separate hash maps per cluster. In the end, this means linear complexity in the number of entries (with a small coefficient). Maybe this is fine, but I think this means worse probe performance because of multi-threaded building...
return &(entryNumber->second); | ||
return entryNumbers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(side note: it might be possible to get around returning vectors by value. For example, after building the partitions the RNTupleIndex
could "link" collisions across partitions. Then we could return a "linked list of vectors", which would behave like a container with a custom iterator.)
#ifdef R__USE_IMT | ||
std::unique_ptr<ROOT::TThreadExecutor> fPool; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this member required? AFAICT it's only used once during Build()
, so it might be constructed and destructed locally
This PR introduces the first steps towards MT support for the
RNTupleIndex
, by enabling mulithreaded building of the index. To enable this, the index itself now manages multiple index partitions, which are essentially sub-indices for a particular entry range. These entry ranges are currently set according to the cluster boundaries, but further benchmarking and evaluation will be required to determine the optimal partitioning scheme.