Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improved shard deletion #24602

Merged
merged 3 commits into from
Mar 26, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 34 additions & 11 deletions tsdb/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -811,7 +811,7 @@ func (s *Store) DeleteShard(shardID uint64) error {
return ErrShardNotFound
}

// Remove the shard from Store so it's not returned to callers requesting
// Remove the shard from Store, so it's not returned to callers requesting
// shards. Also mark that this shard is currently being deleted in a separate
// map so that we do not have to retain the global store lock while deleting
// files.
Expand Down Expand Up @@ -851,16 +851,29 @@ func (s *Store) DeleteShard(shardID uint64) error {

ss := index.SeriesIDSet()

s.walkShards(shards, func(sh *Shard) error {
err = s.walkShards(shards, func(sh *Shard) error {
index, err := sh.Index()
if err != nil {
return err
// Do not stop checking series because one shard failed
s.Logger.Error("cannot find shard index", zap.Uint64("shard_id", sh.ID()), zap.Error(err))
return nil
}

ss.Diff(index.SeriesIDSet())
return nil
})

// This should never happen, because walkShards only returns errors
// from the function passed in, and the function above cannot
// return an error. But, for safety against new implementations, we
// check.
if err != nil {
s.Logger.Error("error walking shards", zap.Error(err))
// TODO(DSB): Should we give up on deleting the shard here, to avoid
// removing series from the series file that may exist in the shard
// whose index we could not load?
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer is "yes". If we continue deleting, we could lose series. Currently this can only happen during a restore, which probably fixes itself on restore completion, but maybe that changes in the future. Also, if the retention service is running then the shard will eventually be deleted by the retention service, or you will keep getting errors in your log. Since currently the only way to get an error is by running a restore, this will work at some point in the near future.

An alternative is to not remove series if we can't walk all the shards, but then we have phantom series that could impact performance. I prefer we leave the shard on disk until we can properly delete it without losing series or creating a performance issue.


// Remove any remaining series in the set from the series file, as they don't
// exist in any of the database's remaining shards.
if ss.Cardinality() > 0 {
Expand All @@ -872,7 +885,7 @@ func (s *Store) DeleteShard(shardID uint64) error {
var keyBuf []byte // Series key buffer.
var name []byte
var tagsBuf models.Tags // Buffer for tags container.
var err error
var errs []error

ss.ForEach(func(id uint64) {
skey := sfile.SeriesKey(id) // Series File series key
Expand All @@ -881,22 +894,32 @@ func (s *Store) DeleteShard(shardID uint64) error {
}

name, tagsBuf = ParseSeriesKeyInto(skey, tagsBuf)
keyBuf = keyBuf[:0]
keyBuf = models.AppendMakeKey(keyBuf, name, tagsBuf)
if err = index.DropSeriesGlobal(keyBuf); err != nil {
return
if tmpErr := index.DropSeriesGlobal(keyBuf); tmpErr != nil {
sfile.Logger.Error(
"cannot drop series",
zap.Uint64("series_id", id),
zap.String("key", string(keyBuf)),
zap.Error(tmpErr))
errs = append(errs, tmpErr)
}
})

if err != nil {
return err
if len(errs) != 0 {
return errors.Join(errs...)
}
}

ss.ForEach(func(id uint64) {
sfile.DeleteSeriesID(id)
if err := sfile.DeleteSeriesID(id); err != nil {
sfile.Logger.Error(
"cannot delete series in shard",
zap.Uint64("series_id", id),
zap.Uint64("shard_id", shardID),
zap.Error(err))
}
})
}

}

// enter the epoch tracker
Expand Down
Loading