-
Notifications
You must be signed in to change notification settings - Fork 179
refactor block size calculation #637
refactor block size calculation #637
Conversation
9022985
to
26a8eca
Compare
It now includes the size of the meta file itself as well for a more correct block size. It fixes a bug where the size didn't change when calling block.Delete. Adds a dedicated test to ensure correct block sizes. To allow opening the db in a read only mode it ensures the meta file is not rewritten when the block size hasn't change . Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
26a8eca
to
64101d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nits, looks good otherwise.
d78542a
to
321002e
Compare
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Taking look now (: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How it looks from compatibility standpoint? We are changing meta.json so old meta.json NumBytes
will be ignored. Is that ok?
I don't see any problem as when opening the db it will re-write the meta to the new version because of this check |
just tested it and works as expected, all meta files are rewritten to included the correct block size information for each piece. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I checked and spoke offline with @krasi-georgiev and it seems like NumBytes
was added for size retention and that was the main use case. I like that we count it more precisely now.
However I find it quite odd that it's not populated on compaction, only on OpenBlock
and rewritten back in meta.json. Especially for read only mode, we now base on fact that OpenBlock has to be opened by write mode TSDB first before we access the meta.json etc.. That's quite weird. Plus making sure compatibilty for meta.json is kept (what if some other system already base on NumBytes
)?
So my question is: Do we really need to bake num of bytes in meta.json even though we calculate precise number in OpenBlock
anyway? (even just to check if we have to rewritte)!
Also if we would stop writing this to meta.json we can totally remove this interesting calculation on how many bytes meta.json will have IF we would write number of those bytes etc.. it's quite complex (:
What do you think guys?
By chance today's blog post uses numBytes. Populating it correctly on compaction seems like it would make more sense. |
This makes me think that we don't need any size info in the meta at all for size based retention. |
Indeed. However if we already have docs for it I wonder what's the use case we described for |
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
I added a test to ensure that blocks with missing size info get recalculated and the meta file is rewritten at open.
I already considered this option(writing the meta size info at compaction when creating a block), but there would be blocks that would never be touched by compaction(e.g. blocks that are too big already).
I considered this option as well, but using the meta simplifies the implementation quite a bit. btw I think everyone should focus on block.Size() rather than the actual meta format. We can probably add some metric so that Brian's blog post uses that instead of relying on the meta format. |
Yes but the meta json format takes the most lines of this PR and the most questions. How keeping this size in meta helps in implementation? we do .Size of everything on every open block anyway in this PR (: Looking on article: https://www.robustperception.io/how-much-disk-space-do-prometheus-blocks-use It seems like use case is to tell block sizes. I would say log line with calculated block size on compaction would be enough for this use case. Or metric as Krasi suggest. I would vote for removing any num size from stats and just calculating on open block. |
That would probably work as well, but I thought that having it in the actual meta would simplify debugging or some other use cases where people would want to know the block size just by looking at the block meta. |
I thought of this when I reviewed, but then a simple |
yep, ok I will refactor to NOT use the meta file and lets see how it looks than. |
Agree with @codesome - it's really easy to see this using linux tooling / object storage meta or whatever else. If any, we can discuss it in another PRs, right? (: |
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
new version ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor suggestions, but LGTM otherwise. Thanks!
CHANGELOG.md
Outdated
@@ -1,5 +1,7 @@ | |||
## master / unreleased | |||
- [FEATURE] Provide option to compress WAL records using Snappy. [#609](https://github.com/prometheus/tsdb/pull/609) | |||
- [BUGFIX] Re-calculate block size when calling `block.Delete`. | |||
- [CHANGE] `BlockStats` no longer holds size information and this is kept in memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [CHANGE] `BlockStats` no longer holds size information and this is kept in memory. | |
- [CHANGE] `BlockStats` no longer holds (not precise) size information. This is now dynamically calculate while opening a block in memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the size is recalculated NOT just when opening a block, but also when calling block.Delete so I prefer to keep this a bit more generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Can we have then first fix (not precise)
and then This is now dynamically calculated and kept in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by `(not precise)'? If that is the missing meta size that it can be something like:
- [CHANGE] `BlockStats` no longer holds size information. This is now dynamically calculated and kept in memory. It also includes the meta file size which was not included before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect!
@@ -285,6 +284,11 @@ type Block struct { | |||
tombstones TombstoneReader | |||
|
|||
logger log.Logger | |||
|
|||
numBytesChunks int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint64
and separate struct maybe for those? Not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all Size,Write interfaces I have seen so far use int or int64 so I prefer to keep it as is.
Also for db.opts.MaxBytes
we also use int64
where -1
means that the size retention is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fine by me then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining
@@ -437,7 +428,9 @@ func (pb *Block) GetSymbolTableSize() uint64 { | |||
|
|||
func (pb *Block) setCompactionFailed() error { | |||
pb.meta.Compaction.Failed = true | |||
return writeMetaFile(pb.logger, pb.dir, &pb.meta) | |||
n, err := writeMetaFile(pb.logger, pb.dir, &pb.meta) | |||
pb.numBytesMeta = n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight inconsistency with below code. We update size only on no error, here we do it no matter what. Can we do it on no error here as well? (: this will avoid potential confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
level.Error(c.logger).Log( | ||
"msg", "Failed to write 'Deletable' to meta file after compaction", | ||
"ulid", b.meta.ULID, | ||
) | ||
} | ||
b.numBytesMeta = n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we set numBytes to new block, but we ignore the other sizes later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this is setting the numBytesMeta for the existing blocks. Existing blocks are already open so need to set the size information when it changes.
Later on it is creating new blocks and these would have their size calculated when they are open at db.reload()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, super minor nit only (:
block.go
Outdated
return writeMetaFile(pb.logger, pb.dir, &pb.meta) | ||
n, err := writeMetaFile(pb.logger, pb.dir, &pb.meta) | ||
if err != nil { | ||
pb.numBytesMeta = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really -> I would just not set anything. (:
level.Error(c.logger).Log( | ||
"msg", "Failed to write 'Deletable' to meta file after compaction", | ||
"ulid", b.meta.ULID, | ||
) | ||
} | ||
b.numBytesMeta = n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, LGTM.
9e0d32c
to
c7e140f
Compare
@brian-brazil This one is needed to unblock the db read only PR so will merge with the current implementation, but feel free to try and refactor at some point if you have a different idea. |
It now includes the size of the meta file itself as well for a more
correct block size.
It fixes a bug where the size didn't change when calling block.Delete.
Adds a dedicated test to ensure correct block sizes.
To allow opening the db in a read only mode it ensures the meta file is
not rewritten when the block size hasn't change .
Signed-off-by: Krasi Georgiev 8903888+krasi-georgiev@users.noreply.github.com