Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SafeDelete faster for large blocks; TSDB block repair case #1056

Merged
merged 29 commits into from
Jul 8, 2019

Conversation

jjneely
Copy link
Contributor

@jjneely jjneely commented Apr 18, 2019

This makes SafeDelete faster and less error prone when dealing with large blocks. This prevents SafeDelete from re-downloading a block that was just downloaded.

This PR also addresses a TSDB block repair case when duplicate series are detected.

Changes

  • SafeDelete takes another argument to use a pre-existing temporary directory possibly containing a previsouly downloaded TSDB block.
  • Duplicate series in a TSDB block are now handled when rewriting a block.

Verification

I run these patches in production, and have repaired quite a few hundred gig of blocks with these.

jjneely added 13 commits March 25, 2019 12:08
When we have label sets that are not in the correct order, fixing that
changes the order of the series in the index.  So the index must be
rewritten in that new order.  This makes this repair tool take up a
bunch more memory, but produces blocks that verify correctly.
The directory name must be the block ID name exactly to verify.  A temp
directory or random name will not work here.
Pointer/reference logic error was eliminating all chunks for a series in
a given TSDB block that wasn't the first chunk.  Chunks are now
referenced correctly via pointers.
If we have just downloaded a TSDB block for repair, we create a brand
new block that is fixed.  The original block can be used as the data we
upload in SafeDelete() to save time/bandwidth rather than re-download it
from the storage bucket.
Some linters catch errors.Errorf() as its not really part of the errors
package.
Where duplicate series means two series with an identical label set.
This code just drops any duplicate series that is encountered after the
first.  No merging or anything fancy.
We're comparing items by pointers, using Go's range variables is
misleading here and we need not fall into the same trap.
This prevents us from having to re-implement label sorting.
@jjneely jjneely force-pushed the jjneely/safe-delete-pr branch from 54b0722 to 8225c79 Compare April 18, 2019 19:30
@jjneely jjneely force-pushed the jjneely/safe-delete-pr branch from 8225c79 to 0a01898 Compare April 23, 2019 17:36
return resid, errors.Wrap(err, "rewrite block")
}
if err := metadata.Write(logger, resdir, &resmeta); err != nil {
return resid, err
}
// TSDB may rewrite metadata in bdir
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TSDB code handily rewrites the block metadata even though we don't write or change read block here. This nukes the extra metadata Thanos uses. Write it back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is also a bug that you are trying to fix? Perhaps we should add it to the changelog and/or the PR messsage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bug, this is part of the initial patches to be able to reuse downloaded blocks. The TSDB library isn't aware of the additional meta data Thanos provides and was removing it. Therefore, when a previously downloaded block is reused it would have incomplete metadata without this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is being changed now: prometheus-junkyard/tsdb#637 so this won't be needed TBH (once we use latest TSDB).

Also this is true what @GiedriusS said - splitting PRs into focused changes helps a lot. However this change relies on this fix. I would be more explicit in description. This will help us to understand this early on (:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add TODO to update TSDB and remove this as it will be not needed in newest TSDB. For now we need this, thanks for fixing this 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo added, thanks!

// Build a new TSDB block.
for _, s := range series {
if labels.Compare(lastSet, s.lset) == 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The series are sorted based on their labelSet. What happens if two labelSets are identical? For one they will be consecutive in the slice. So if the labelSet we have now is the same as the previous labelSet we have a duplicate series which the TSDB code will produce an error on. So we just drop that series.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment on the implied behavior here. I believe you when you say you've seen this behavior, but it worries me, as something went terribly wrong wherever these blocks were produced. We should at least document that the heuristic here is that on collision we keep the first series found and drop consecutive ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry one more thing. As this shouldn't generally happen, let's add a metric so we can monitor when it does happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repair process batch job only which makes metrics less useful. I had some similar comments in #964

Perhaps I should leave a comment if that were to change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's in that case put a TODO in there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO added.

tempdir, err := ioutil.TempDir("", "safe-delete")
if err != nil {
return err
if tempdir == "" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we already have the block, don't download it again. Blocks can be large. 30GiB or close to it in my environment.

@jjneely
Copy link
Contributor Author

jjneely commented Apr 26, 2019

I've tested this several times on my own TSDB blocks. I've been wondering how to get better tests integrated into the code as unit tests, but I cannot make these TSDB blocks of mine public. But I've repaired several hundred GiB of data with this.

// source bucket. It returns error if block dir already exists in backup
// bucket (blocks should be immutable) or any of the operation fails. If
// the block has already been downloaded it must exist in tempdir, else it
// will be downloaded to be copied to the backup bucket. tempdir set to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to hide this behavior in tempdir values. Maybe "safe delete modes" (as in "force download", "try local copy") or a bool (as in just "forceDownload") at the very least that makes it very explicit what's going to happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SafeDelete() doesn't know the temporary directory where a previous function may have downloaded the block, so we need to pass in a path somehow. Perhaps rename this argument something like useExisting? Which would mean we use the existing block if found under that path, otherwise a new temporary directory is created and the block is downloaded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I meant let's pass two parameters to explicitly describe the intention

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, committed a patch for makes this explicit. I've tested the repair process with my collection of broken blocks I've started keeping and had the behavior expected. (No repeated downloads.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had thanos-compact cut a 450GiB TSDB block today which I referenced in #1152 which is part of the motivation for this.


if _, err := os.Stat(dir); err != nil {
// TSDB block not already present, download it
err = os.Mkdir(dir, 0755)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should only be done if os.IsNotExist(err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

return errors.Wrap(err, "download from source")
}
} else {
level.Info(logger).Log("msg", "using previously downloaded tsdb block",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what kind of situation would we end up in this path? And if the answer is a (half or maybe even fully) finished previous download, then how do we prevent that the directory is there but the content is not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My blocks (and I'm sure others) easily grow above 30GiB when compacted. So if I need to repair block A, Thanos will download block A, create a repaired block B, upload block B. So at this point, we've created, successfully, a new block from A and that block B now passes the VerifyIndex() test.

At this point we call SafeDelete() which downloads block A, uploads block A to the backup bucket, then deletes block A from the main bucket.

This process can be time consuming (and perhaps expensive). So if blocks are immutable, and the repair process verifies that A is a legitimate block (if malformed) we should be able to skip the redundant download of A and backup the existing downloaded copy.

// Build a new TSDB block.
for _, s := range series {
if labels.Compare(lastSet, s.lset) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment on the implied behavior here. I believe you when you say you've seen this behavior, but it worries me, as something went terribly wrong wherever these blocks were produced. We should at least document that the heuristic here is that on collision we keep the first series found and drop consecutive ones.

if err := os.RemoveAll(tempdir); err != nil {
level.Warn(logger).Log("msg", "failed to delete dir", "dir", tempdir, "err", err)

if _, err := os.Stat(dir); os.IsNotExist(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're swallowing any error that is not "IsNotExist" here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just do here os.Mkdir and ignore the error if it is saying that the directory already exists. Otherwise, we will have racy behaviour here (of course, there's barely any chance that someone will run two of these at the same time but still)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good suggestions!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some additional logic here to handle any other possible error from os.Stat().

The pre-existence of the directory triggers the code path where we can avoid having to re-download a possibly large TSDB block again. Races with the os.Mkdir() are not anticipated as if the temporary directory isn't supplied ioutil.TempDir()is used to generate a value. When we do supply tempdir in pkg/verifier/index_issue.go is it also generated with ioutil.Tempdir():

tmpdir, err := ioutil.TempDir("", fmt.Sprintf("index-issue-block-%s-", id))

// previously downloaded block found in tempdir to avoid repeated downloads.
// If tempdir is an empty string a unique temporary directory is created for
// scratch space, otherwise the given directory is used.
func SafeDelete(ctx context.Context, logger log.Logger, bkt objstore.Bucket, backupBkt objstore.Bucket, id ulid.ULID, useExisting bool, tempdir string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for nitpicking on this, but can we name this useExistingTempdir? I realize that's long, but we'll understand in 3 months what this means that way :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem. Fixed.

if err := block.Download(ctx, logger, bkt, id, dir); err != nil {
return errors.Wrap(err, "download from source")
if useExisting {
if _, err := os.Stat(filepath.Join(dir, "meta.json")); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check really sufficient? Shouldn't we check that all the downloaded content is equal to what is in object storage, ideally by their hash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There don't appear to be tools for this in the existing code base. So this seems like a bit of scope creep. Glad to add additional checks here, but these features sound like another GitHub Issue/PR.

In fact, why isn't there a checksum in meta.json? It doesn't have to be cryptographically secure, but it would serve to verify that the block was intact. As we move from "metric data is ephemeral" to "metric data is important long term" that would be quite handy.

tempdir, err := ioutil.TempDir("", "safe-delete")
if err != nil {
return err
if tempdir == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I wasn't clear, I would expect that when useExisting is set, that tempDir must be set, so I would expect an error from this function if that is not the case, and this if statement to be combined with the parts below, I think that'll also make the code a lot more easy to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, refactored this a bit. It will return an error if you ask it to use an existing tempdir and do not provide one, and the logic is a little simpler.

Be extra explicit about using existing tempdirs.  Added note about
future improvements for verifying if we have an intact tsdb block.
@jjneely
Copy link
Contributor Author

jjneely commented May 17, 2019

cluster_test.go:129 fails on me again...

@brancz
Copy link
Member

brancz commented May 29, 2019

gossip has been removed, if you rebase I think those flakes should go away :)

@brancz
Copy link
Member

brancz commented May 29, 2019

Sorry for the silence, I'm working off the backlog that accumulated over kubecon, once rebased and CI is green I'm happy to give this another pass

@jjneely
Copy link
Contributor Author

jjneely commented May 29, 2019

No problem. I know there are a lot of PRs as well. Does rebasing help you move this forward?

@brancz
Copy link
Member

brancz commented May 29, 2019

The test that kept flaking is gone due to the rebase, that's all :)

@jjneely
Copy link
Contributor Author

jjneely commented Jun 7, 2019

Updates on getting this merged?

Copy link
Member

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but as this is easy to miss something I'd be good if we can get a second approval from @bwplotka or @GiedriusS

if labels.Compare(lastSet, s.lset) == 0 {
level.Warn(logger).Log("msg",
"dropping duplicate series in tsdb block found",
"labelset", fmt.Sprintf("%s", s.lset),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use s.lset.String() here. The new meta linter will complain about this 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return resid, errors.Wrap(err, "rewrite block")
}
if err := metadata.Write(logger, resdir, &resmeta); err != nil {
return resid, err
}
// TSDB may rewrite metadata in bdir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is also a bug that you are trying to fix? Perhaps we should add it to the changelog and/or the PR messsage.

// previously downloaded block found in tempdir to avoid repeated downloads.
// If tempdir is an empty string a unique temporary directory is created for
// scratch space, otherwise the given directory is used.
func SafeDelete(ctx context.Context, logger log.Logger, bkt objstore.Bucket, backupBkt objstore.Bucket, id ulid.ULID, useExistingTempdir bool, tempdir string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we just get rid of useExistingTempdir here and use tempdir to control how this function works? It seems like we are making this unnecessarily complex. Perhaps I am missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt like giving a particular value of tempdir special meaning was not obvious enough. I think I prefer this over tempdir being empty. I do agree that below the if doesn't need the tempdir == "" case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel strongly about this @GiedriusS I'm ok with the other behavior, just feel I will have forgotten the implicit behavior of the empty string in 3 months, hence I wanted it to be more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I feel strongly about this - boolean in random place of the functions IMO smells bad. Same with 2 flags controlling same thing in very strict way.Essentially it does not make sense to set useExistingTempdir = true and tempdir = "" so we need to validate that etc. Interface is broader then needed.

Also if userExistingTempdir = true and tempdir is specified you literally don't use bkt so it's even worse in terms of readability and surprises ):

I would propose:

  • BackupAndDelete(ctx context.Context, logger log.Logger, bkt objstore.Bucket, backupBkt objstore.Bucket, id ulid.ULID) as it was.
  • BackupAndDeleteFromLocal(ctx context.Context, logger log.Logger, bdir string, backupBkt objstore.Bucket)

This will ensure separation of concerns AND in fact BackupAndDelete can use BackupAndDeleteFromLocal internally. Right now what I can see is 2 function baked in one function (:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like @bwplotka's suggestion best as well.

err = os.Mkdir(dir, 0755)
if err != nil {
return err
if useExistingTempdir && tempdir == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like they both can be set or unset which proves my point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I touched up the comments for this function again so that they better describe the behavior of the current code. No code changes though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed different approach - splitting this into 2 function which will solve this issue (:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now committed. The function signatures didn't quite work out as we would have liked, but I totally aggree, this feels much, much nicer.

@jjneely
Copy link
Contributor Author

jjneely commented Jun 24, 2019

Updates?

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjneely Thanks for this!

I think it makes a lot of sense, but propose a split in SafeDelete to ensure readability and address @GiedriusS concerns (: Let me know if that makes sense.

Otherwise LGTM!

return resid, errors.Wrap(err, "rewrite block")
}
if err := metadata.Write(logger, resdir, &resmeta); err != nil {
return resid, err
}
// TSDB may rewrite metadata in bdir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is being changed now: prometheus-junkyard/tsdb#637 so this won't be needed TBH (once we use latest TSDB).

Also this is true what @GiedriusS said - splitting PRs into focused changes helps a lot. However this change relies on this fix. I would be more explicit in description. This will help us to understand this early on (:

return resid, errors.Wrap(err, "rewrite block")
}
if err := metadata.Write(logger, resdir, &resmeta); err != nil {
return resid, err
}
// TSDB may rewrite metadata in bdir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also missing trailing period in comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Build a new TSDB block.
for _, s := range series {
// The TSDB library will throw an error if we add a series with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove double spaces in this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old habits die hard...fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask where is this habit from? Sorry for my ignorance - super curious (:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol... Search google for "double space after period" you'll learn more than you bargained for. Like, my age. :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL never heard of this either. But actually the rule could apply here since code formatting is using monospaced fonts mostly :)

return resid, errors.Wrap(err, "rewrite block")
}
if err := metadata.Write(logger, resdir, &resmeta); err != nil {
return resid, err
}
// TSDB may rewrite metadata in bdir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add TODO to update TSDB and remove this as it will be not needed in newest TSDB. For now we need this, thanks for fixing this 👍

// previously downloaded block found in tempdir to avoid repeated downloads.
// If tempdir is an empty string a unique temporary directory is created for
// scratch space, otherwise the given directory is used.
func SafeDelete(ctx context.Context, logger log.Logger, bkt objstore.Bucket, backupBkt objstore.Bucket, id ulid.ULID, useExistingTempdir bool, tempdir string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I feel strongly about this - boolean in random place of the functions IMO smells bad. Same with 2 flags controlling same thing in very strict way.Essentially it does not make sense to set useExistingTempdir = true and tempdir = "" so we need to validate that etc. Interface is broader then needed.

Also if userExistingTempdir = true and tempdir is specified you literally don't use bkt so it's even worse in terms of readability and surprises ):

I would propose:

  • BackupAndDelete(ctx context.Context, logger log.Logger, bkt objstore.Bucket, backupBkt objstore.Bucket, id ulid.ULID) as it was.
  • BackupAndDeleteFromLocal(ctx context.Context, logger log.Logger, bdir string, backupBkt objstore.Bucket)

This will ensure separation of concerns AND in fact BackupAndDelete can use BackupAndDeleteFromLocal internally. Right now what I can see is 2 function baked in one function (:

err = os.Mkdir(dir, 0755)
if err != nil {
return err
if useExistingTempdir && tempdir == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed different approach - splitting this into 2 function which will solve this issue (:

@bwplotka
Copy link
Member

And sorry for massive delay in review!

jjneely and others added 4 commits June 25, 2019 15:54
Co-Authored-By: Bartek Płotka <bwplotka@gmail.com>
Co-Authored-By: Bartek Płotka <bwplotka@gmail.com>
Co-Authored-By: Bartek Płotka <bwplotka@gmail.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return resid, errors.Wrap(err, "rewrite block")
}
if err := metadata.Write(logger, resdir, &resmeta); err != nil {
return resid, err
}
// TSDB may rewrite metadata in bdir.
// TODO: This is not needed in newer TSDB code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR number would be nice for easier trackngm but not a blocker (:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

// Build a new TSDB block.
for _, s := range series {
// The TSDB library will throw an error if we add a series with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask where is this habit from? Sorry for my ignorance - super curious (:

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more nit and LGTM, thanks for this!

return errors.Errorf("%s dir seems to exists in backup bucket. Remove this block manually if you are sure it is safe to do", id)
}

// Now we call backupAndDeleteDownloaded() to handle the work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is not needed.

// backupAndDeleteDownloaded is a helper function that uploads a TSDB block
// found on disk to the backup bucket and, on success, removes the TSDB block
// from the source bucket.
func backupAndDeleteDownloaded(ctx context.Context, logger log.Logger, bdir string, bkt, backupBkt objstore.Bucket, id ulid.ULID) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being picky but there is some improvement we can make here.

I like BackupAndDelete vs BackupAndDeleteDownloaded, but it's not clear what's the difference between backupAndDeleteDownloaded and BackupAndDeleteDownloaded is. Sure they are heavily commented, but comment should have less priority than naming. I think here the proper naming and small possible signature of function would do the work.

So what about:

Suggested change
func backupAndDeleteDownloaded(ctx context.Context, logger log.Logger, bdir string, bkt, backupBkt objstore.Bucket, id ulid.ULID) error {
func backupDownloaded(ctx context.Context, logger log.Logger, bdir string, bktToRemove, backupBkt objstore.Bucket, id ulid.ULID) error {

and then if no error do on the caller in both BackupAndDeleteDownloaded and BackupAndDelete just.

// Block uploaded, so we are ok to remove from src bucket.
if err := block.Delete(ctx, bkt, id); err != nil {
	return errors.Wrap(err, "delete from source")
}

This will make the signature of BackupAndDeleteDownloaded and backupDowloaded smaller (thus reducing complexity) as you can drop bkt argument.

What do you think? (:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted as requested, and went through another round of testing.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks 👍

@bwplotka
Copy link
Member

bwplotka commented Jul 8, 2019

I think netlify issue is because of not rebased PR - nothing changed for netlify here so merging.

@bwplotka bwplotka merged commit 63257cf into thanos-io:master Jul 8, 2019
@jjneely
Copy link
Contributor Author

jjneely commented Jul 8, 2019

Awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants