Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up delete/drop statements #7015

Merged
merged 8 commits into from
Jul 15, 2016
Merged

Speed up delete/drop statements #7015

merged 8 commits into from
Jul 15, 2016

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Jul 14, 2016

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated

This PR speeds up drop and delete statements by converting the locks held from write locks to shorter read locks as well as processing shards in parallel.

The write locks where problematic on the tsdb.Store as that essentially locks up the database to all queries and writes. If the deletes take a long time, many things backup (writes, compactions, etc..) which can cause memory problems, lockups, and unresponsiveness.

Processing shards in parallel reduces the execution time of the operation when there are many shards on disk.

Fixes #6819 #6796

@jwilder jwilder added this to the 1.0.0 milestone Jul 14, 2016
@mention-bot
Copy link

@jwilder, thanks for your PR! By analyzing the annotation information on this pull request, we identified @e-dard and @joelegasse to be potential reviewers

jwilder added 6 commits July 14, 2016 17:31
Reduces the lock contention on the tsdb.Store by taking a short
read lock instead of a long write lock.  Also processes shards
in parallel instead of serially.
Reduce the lock contention on tsdb.Store by taking a short lived
read-lock instead of a long write lock.  Also close shards in parallel
and drop the whole RP dir in bulk instead of each shard dir.
Only used by one caller now
Reduce lock contention and process shards in concurrently.
@@ -458,16 +443,24 @@ func (s *Store) DeleteRetentionPolicy(database, name string) error {
}

// Remove the retention policy folder from the the WAL.
return os.RemoveAll(filepath.Join(s.EngineOptions.Config.WALDir, database, name))
if err := os.RemoveAll(filepath.Join(s.EngineOptions.Config.WALDir, database, name)); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the RP get removed from the metadata before this is called? Would this maybe cause a problem if it isn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The statement executor drops the data from the tsdb.Store and only removes it from the meta store after that succeeds. We previously deleted it from the meta store and then drop the shard data, but that causes problems if the data deleting fails (orphan data, data re-appearing after startup, etc..).

return
}

resC <- &res{s: sh}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shard field is never used, so this send doesn't really do anything other than signal the work is done. What do you think about using a WaitGroup for the synchronization aspect, rather than counting? resC can just be errC := make(chan error), and go func() { wg.Wait(); close(errC) } will handle closing the channel so that for err := range errC can be used to read the errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, the shard is not used here. It was adapted from the Open code which does use it. I'll remove it.

I tend to prefer using channels over WaitGroup.

@joelegasse
Copy link
Contributor

LGTM 👍

}
if err := s.walkShards(shards, func(sh *Shard) error {
if sh.database != database || sh.retentionPolicy != name {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be impossible right? Is this just being extra defensive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@corylanou corylanou mentioned this pull request Jul 21, 2016
3 tasks
jwilder added a commit that referenced this pull request Jul 25, 2017
There was a change to speed up deleting and dropping measurements
that executed the deletes in parallel for all shards at once. #7015

When TSI was merged in #7618, the series keys passed into Shard.DeleteMeasurement
were removed and were expanded lower down.  This causes memory to blow up
when a delete across many shards occurs as we now expand the set of series
keys N times instead of just once as before.

While running the deletes in parallel would be ideal, there have been a number
of optimizations in the delete path that make running deletes serially pretty
good.  This change just limits the concurrency of the deletes which keeps memory
more stable.
jwilder added a commit that referenced this pull request Jul 27, 2017
There was a change to speed up deleting and dropping measurements
that executed the deletes in parallel for all shards at once. #7015

When TSI was merged in #7618, the series keys passed into Shard.DeleteMeasurement
were removed and were expanded lower down.  This causes memory to blow up
when a delete across many shards occurs as we now expand the set of series
keys N times instead of just once as before.

While running the deletes in parallel would be ideal, there have been a number
of optimizations in the delete path that make running deletes serially pretty
good.  This change just limits the concurrency of the deletes which keeps memory
more stable.
jwilder added a commit that referenced this pull request Jul 27, 2017
There was a change to speed up deleting and dropping measurements
that executed the deletes in parallel for all shards at once. #7015

When TSI was merged in #7618, the series keys passed into Shard.DeleteMeasurement
were removed and were expanded lower down.  This causes memory to blow up
when a delete across many shards occurs as we now expand the set of series
keys N times instead of just once as before.

While running the deletes in parallel would be ideal, there have been a number
of optimizations in the delete path that make running deletes serially pretty
good.  This change just limits the concurrency of the deletes which keeps memory
more stable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants