Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] Add MT index building #16679

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

enirolf
Copy link
Contributor

@enirolf enirolf commented Oct 14, 2024

This PR introduces the first steps towards MT support for the RNTupleIndex, by enabling mulithreaded building of the index. To enable this, the index itself now manages multiple index partitions, which are essentially sub-indices for a particular entry range. These entry ranges are currently set according to the cluster boundaries, but further benchmarking and evaluation will be required to determine the optimal partitioning scheme.

Building a separate index for each RNTuple cluster (or any other
partitioning scheme) opens up the possibility for multithreaded
building and probing. Note that the choice to partition on cluster
boundaries might not be the optimal choice performance-wise and might be
changed in the future following further evaluation.
Copy link

Test Results

    17 files      17 suites   4d 2h 3m 25s ⏱️
 2 713 tests  2 713 ✅ 0 💤 0 ❌
43 511 runs  43 511 ✅ 0 💤 0 ❌

Results for commit 8cce410.

Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments inline. I think the current implementation with partitions means the built index is less performant (on probe) than before, which is not ideal. Did you consider merging the partial indices at the end to preserve constant complexity?

Comment on lines -179 to +231
const std::vector<NTupleSize_t> *GetAllEntryNumbers(const std::vector<void *> &valuePtrs) const;
const std::vector<NTupleSize_t> GetAllEntryNumbers(const std::vector<void *> &valuePtrs) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this interface forces the allocation of a std::vector including its heap-backed data for every probe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, returning a const value is generally not what you want (as it is my understanding it may prevent move semantics/rvalue reference semantics, and it doesn't really give any benefit to the caller).
To avoid forcing the allocation on the caller while giving them ownership of the data it's probably better to pass the out vector as a reference parameter (so they may reuse it across calls etc).

Comment on lines +90 to +92
RNTupleIndexPartition(const RClusterDescriptor &descriptor,
const std::vector<std::unique_ptr<RFieldBase>> &indexFields, const RPageSource &pageSource)
: fPageSource(pageSource.Clone())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this constructor need inlining? If not I think it should be defined in the source file

Comment on lines +101 to +102
auto clonedField = field->Clone(field->GetFieldName());
CallConnectPageSourceOnField(*clonedField, *fPageSource);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we need to clone and connect page sources and fields for every index partition / cluster, which means we probably cannot use cluster prefetching...

ROOT::Experimental::Internal::RNTupleIndex::GetAllEntryNumbers(const std::vector<void *> &valuePtrs) const
{
if (valuePtrs.size() != fIndexFields.size())
throw RException(R__FAIL("Number of value pointers must match number of indexed fields."));

EnsureBuilt();

std::vector<NTupleIndexValue_t> indexValues;
indexValues.reserve(fIndexFields.size());
std::vector<std::vector<NTupleIndexValue_t>> entryNumbersPerCluster;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation uses two vectors (entryNumbersPerCluster and entryNumbers) which require two heap allocations. Would it be possible to fill entryNumbers directly?

Comment on lines +157 to +164
for (const auto &indexPartition : fIndexPartitions) {
auto clusterEntryNumbers = indexPartition.fIndex.find(indexValue);

if (clusterEntryNumbers == indexPartition.fIndex.end())
continue;

entryNumbersPerCluster.push_back(clusterEntryNumbers->second);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this introduces linear complexity in the number of clusters: while fIndex.find() has constant complexity, now there are separate hash maps per cluster. In the end, this means linear complexity in the number of entries (with a small coefficient). Maybe this is fine, but I think this means worse probe performance because of multi-threaded building...

Comment on lines -144 to +172
return &(entryNumber->second);
return entryNumbers;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(side note: it might be possible to get around returning vectors by value. For example, after building the partitions the RNTupleIndex could "link" collisions across partitions. Then we could return a "linked list of vectors", which would behave like a container with a custom iterator.)

Comment on lines +132 to +134
#ifdef R__USE_IMT
std::unique_ptr<ROOT::TThreadExecutor> fPool;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this member required? AFAICT it's only used once during Build(), so it might be constructed and destructed locally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants