-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi Copy for Node Tables #3298
Conversation
beaf397
to
1d7616d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
#include "common/vector/value_vector.h" | ||
#include "storage/store/chunked_node_group.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double check if the include of chunked_node_group.h
is needed. I think it should be indirectly included by chunked_node_group_collection.h
already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, but I don't like relying on indirect includes, they just make the changes more confusing when things are removed and you have to add unrelated headers (though admittedly in this case it is something that's closely related to the existing include).
void NodeBatchInsert::clearToIndex(std::unique_ptr<ChunkedNodeGroup>& nodeGroup, | ||
common::offset_t startIndexInGroup) { | ||
// Create a new chunked node group and move the unwritten values to it | ||
// TODO(bmwinger): Can probably re-use the chunk and shift the values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we can let writeToExistingNodeGroup
handle writing from the middle of the node group? so we don't need clearToIndex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. Do you mean write the last n
values to the existing chunk, truncate the group, and continue adding to the existing one and use that when creating a new node group?
In addition to it meaning that even in single-threaded mode the values will not be in their original order (though I'm not sure if we have any guarantees of that anyway), truncation could lead to things like the re-used ListColumnChunk
storing extra values from the original data, since ListColumnChunk::finalize
doesn't always remove unused values.
But either way it would add a function that needs to be implemented by all the column chunk types and could have subtle side-effects if omitted (but not failures, which makes it easier to miss), which was why I decided against including that in this PR.
auto nodeSharedState = | ||
ku_dynamic_cast<BatchInsertSharedState*, NodeBatchInsertSharedState*>(sharedState.get()); | ||
if (nodeSharedState->pkType.getLogicalTypeID() != LogicalTypeID::SERIAL) { | ||
nodeSharedState->initPKIndex(context); | ||
} | ||
// Set initial node group index, which should be the last one available on disk which is not | ||
// full, or the next index. | ||
auto nodeTable = ku_dynamic_cast<Table*, NodeTable*>(nodeSharedState->table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this is a bit confusing. what is currentNodeGroupIdx
used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used to track the next node group index to write to. This sets it either to the last non-empty node group, or to the first non-existent one, depending on whether or not there is a last non-empty one.
c3b6055
to
7d7c9d5
Compare
7d7c9d5
to
8250751
Compare
(cherry picked from commit b41b649)
Performance is currently badly bottlenecked by
HashIndex::splitSlot
which hasn't been optimized using the new caching disk array WriteIterator (used when doing bulk inserts and the hash index is non-empty, i.e., for each additional copy). Currently the second copy performance is roughly 10x worse as a result. I've already written an optimized version that improves it to 2.8x worse, but there's a bug I need to fix and I thought I should get this merged now that it's passing the tests.I also discovered a bug in
DBFileUtils::updatePage
, which is mostly unrelated to multi copy: if a WAL page already exists, it won't be marked as dirty, however it's possible for a wal page to not be dirty if it was previously evicted and then read (but not modified), before being updated withDBFileUtils::updatePage
(probably reproduced here due to the last of the multi copy tests being quite large and pushing the 64MB buffer pool size used by the tests).