Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dense reader fails early when tile offsets are too large. #5310

Merged
merged 3 commits into from
Sep 16, 2024

Conversation

KiterLuc
Copy link
Contributor

@KiterLuc KiterLuc commented Sep 16, 2024

This change allows to accurately compute the memory usage for tile offsets before they are loaded. We can then throw an error before we do any work if the tile offsets are too large for the memory budget. This will enable us to fail faster on REST when performing large queries and do fast retries.

Note that the memory usage for tile offsets is done per fragment, which will ease the implementation of partial tile offsets loading for the dense reader.

[sc-55116]


TYPE: IMPROVEMENT
DESC: Dense reader fails early when tile offsets are too large.

This change allows to accurately compute the memory usage for tile offsets before they are loaded. We can then throw an error before we do any work if the tile offsets are too large for the memory budget. This will enable us to fail faster on REST when performing large queries and do fast retries.

Note that the memory usage for tile offsets is done per fragment, which will ease the implementation of partial tile offsets loading for the dense reader.

---
TYPE: IMPROVEMENT
DESC: Dense reader fails early when tile offsets are too large.
@@ -145,6 +147,134 @@ bool ReaderBase::skip_field(
/* PROTECTED METHODS */
/* ****************************** */

std::vector<uint64_t> ReaderBase::tile_offset_sizes() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: This was move almost as is from SparseIndexReaderBase. I have only added two "if (!dense) {".

per_frag_tile_offsets_usage_.end(),
static_cast<uint64_t>(0));
if (total_tile_offsets_sizes >
memory_budget_ - array_memory_tracker_->get_memory_usage()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other readers we do > array_memory_tracker_->get_memory_available() , is this case somehow different and we need to do the calculation here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the get_memory_usage() function does not subtract the tile offsets from the new "counter" memory tracking, and the old tracking system doesn't account for them either. However, array_memory_tracker_->get_memory_available() does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is desired. There are some memory structures allocated for tile offsets at this time and we don't want them to be included.

Copy link
Contributor

@DimitrisStaratzis DimitrisStaratzis Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the only structure allocated for tile offsets memory is the counters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other readers have tile offsets loaded at the time they are ready to process tiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I thought the only structure allocated for tile offsets memory is the counters"

Yes they are, but get_memory_available() will subtract that counter from the available memory. Since our calculation include everything, we should not remove the tile offsets counter from the available memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we don't want to subtract it from available memory? Isn't that memory taken by some of our structures, so it cannot be used for loading offsets? We are we tracking those structures then?
If that's just some confusing design we wouldn't like to address as part of this ticket, let's make sure we address it in some follow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't subtract it from the available memory because it's included in total_tile_offsets_sizes... We don't want to include it twice. There's already stories to reconcile everything and get rid of the legacy memory tracker which confuses everything. But before we can do that, we need to complete the partial tile offsets loading for all readers as well as partial rtree loading.


// Try to read.
int data_r[NUM_CELLS] = {0};
uint64_t data_r_size = sizeof(data_r);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some more comments to make the test clearer? I don't understand the calculations here.

Also can we add a less trivial test too where we write more than 1 fragment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What computation? This just computes the size of the user buffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the calculations in the test. It's wasn't evident to untrained eyes why a budget of 600 previously was not enough. Thank you for adding those comments.

One last question, does tile_upper_memory_limit_ play a role here? Could it be completely omitted so that we test what's actually the case, what happens when we only set total_budget and leave that one as default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be removed from there.

Copy link
Contributor

@DimitrisStaratzis DimitrisStaratzis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also run a nightly build with the new tests. I have seen them fail when dealing with memory budgets.

@KiterLuc
Copy link
Contributor Author

@DimitrisStaratzis
https://github.com/TileDB-Inc/TileDB/actions/runs/10882409493

@KiterLuc
Copy link
Contributor Author

/backport to release-2.26

Copy link
Contributor

Started backporting to release-2.26: https://github.com/TileDB-Inc/TileDB/actions/runs/10884706119

@KiterLuc KiterLuc merged commit 7253795 into dev Sep 16, 2024
62 checks passed
@KiterLuc KiterLuc deleted the lr/tile-offset-sizes-dense/sc55116 branch September 16, 2024 14:23
KiterLuc added a commit that referenced this pull request Sep 16, 2024
…e too large. (#5310) (#5311)

Backport of #5310 to release-2.26.

---
TYPE: IMPROVEMENT
DESC: Dense reader fails early when tile offsets are too large.

---------

Co-authored-by: Luc Rancourt <lucrancourt@gmail.com>
Co-authored-by: KiterLuc <67824247+KiterLuc@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants