Speed up delete/drop statements #7015

jwilder · 2016-07-14T23:29:07Z

Required for all non-trivial PRs

Rebased/mergable
Tests pass
CHANGELOG.md updated

This PR speeds up drop and delete statements by converting the locks held from write locks to shorter read locks as well as processing shards in parallel.

The write locks where problematic on the tsdb.Store as that essentially locks up the database to all queries and writes. If the deletes take a long time, many things backup (writes, compactions, etc..) which can cause memory problems, lockups, and unresponsiveness.

Processing shards in parallel reduces the execution time of the operation when there are many shards on disk.

Fixes #6819 #6796

mention-bot · 2016-07-14T23:29:09Z

@jwilder, thanks for your PR! By analyzing the annotation information on this pull request, we identified @e-dard and @joelegasse to be potential reviewers

Reduces the lock contention on the tsdb.Store by taking a short read lock instead of a long write lock. Also processes shards in parallel instead of serially.

Reduce the lock contention on tsdb.Store by taking a short lived read-lock instead of a long write lock. Also close shards in parallel and drop the whole RP dir in bulk instead of each shard dir.

Only used by one caller now

Reduce lock contention and process shards in concurrently.

pauldix · 2016-07-15T15:54:44Z

tsdb/store.go

@@ -458,16 +443,24 @@ func (s *Store) DeleteRetentionPolicy(database, name string) error {
 	}

 	// Remove the retention policy folder from the the WAL.
-	return os.RemoveAll(filepath.Join(s.EngineOptions.Config.WALDir, database, name))
+	if err := os.RemoveAll(filepath.Join(s.EngineOptions.Config.WALDir, database, name)); err != nil {


does the RP get removed from the metadata before this is called? Would this maybe cause a problem if it isn't?

No. The statement executor drops the data from the tsdb.Store and only removes it from the meta store after that succeeds. We previously deleted it from the meta store and then drop the shard data, but that causes problems if the data deleting fails (orphan data, data re-appearing after startup, etc..).

joelegasse · 2016-07-15T16:31:14Z

tsdb/store.go

+				return
+			}
+
+			resC <- &res{s: sh}


The shard field is never used, so this send doesn't really do anything other than signal the work is done. What do you think about using a WaitGroup for the synchronization aspect, rather than counting? resC can just be errC := make(chan error), and go func() { wg.Wait(); close(errC) } will handle closing the channel so that for err := range errC can be used to read the errors.

Ah right, the shard is not used here. It was adapted from the Open code which does use it. I'll remove it.

I tend to prefer using channels over WaitGroup.

joelegasse · 2016-07-15T18:22:50Z

LGTM 👍

corylanou · 2016-07-20T16:36:29Z

tsdb/store.go

-			}
+	if err := s.walkShards(shards, func(sh *Shard) error {
+		if sh.database != database || sh.retentionPolicy != name {
+			return nil


This should be impossible right? Is this just being extra defensive?

There was a change to speed up deleting and dropping measurements that executed the deletes in parallel for all shards at once. #7015 When TSI was merged in #7618, the series keys passed into Shard.DeleteMeasurement were removed and were expanded lower down. This causes memory to blow up when a delete across many shards occurs as we now expand the set of series keys N times instead of just once as before. While running the deletes in parallel would be ideal, there have been a number of optimizations in the delete path that make running deletes serially pretty good. This change just limits the concurrency of the deletes which keeps memory more stable.

jwilder added the area/performance label Jul 14, 2016

jwilder added this to the 1.0.0 milestone Jul 14, 2016

jwilder added 6 commits July 14, 2016 17:31

Speed up drop measurement

6d3d2f6

Reduces the lock contention on the tsdb.Store by taking a short read lock instead of a long write lock. Also processes shards in parallel instead of serially.

Speed up drop retention policy

e0122ef

Reduce the lock contention on tsdb.Store by taking a short lived read-lock instead of a long write lock. Also close shards in parallel and drop the whole RP dir in bulk instead of each shard dir.

Refactor DeleteDatabase to use filter/walk funcs

78201e1

Inline deleteShard

8f3ec3b

Only used by one caller now

Speed up delete series

ff5d61d

Reduce lock contention and process shards in concurrently.

Update changelog

5686e9e

jwilder force-pushed the jw-drop branch from 81fcdff to 5686e9e Compare July 14, 2016 23:31

pauldix reviewed Jul 15, 2016
View reviewed changes

Fix missing read locks before filtering

d1556e3

joelegasse reviewed Jul 15, 2016
View reviewed changes

Simplify throttle type

21dbe7e

jwilder merged commit 6710c69 into master Jul 15, 2016

jwilder deleted the jw-drop branch July 15, 2016 18:41

jwilder mentioned this pull request Jul 18, 2016

Execute DeleteRetentionPolicy and DeleteMeasurement in parallel #6847

Closed

corylanou reviewed Jul 20, 2016
View reviewed changes

corylanou mentioned this pull request Jul 21, 2016

remove limiter from walkShards #7043

Merged

3 tasks

jwilder mentioned this pull request Jul 21, 2016

Out of Memory (Fatal) Error when Dropping Measurement #6796

Closed

jwilder mentioned this pull request Jul 25, 2017

Prevent excessive memory usage when dropping series #8630

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up delete/drop statements #7015

Speed up delete/drop statements #7015

jwilder commented Jul 14, 2016 •

edited

Loading

mention-bot commented Jul 14, 2016

pauldix Jul 15, 2016

jwilder Jul 15, 2016

joelegasse Jul 15, 2016

jwilder Jul 15, 2016

joelegasse commented Jul 15, 2016

corylanou Jul 20, 2016

jwilder Jul 20, 2016

Speed up delete/drop statements #7015

Speed up delete/drop statements #7015

Conversation

jwilder commented Jul 14, 2016 • edited Loading

Required for all non-trivial PRs

mention-bot commented Jul 14, 2016

pauldix Jul 15, 2016

Choose a reason for hiding this comment

jwilder Jul 15, 2016

Choose a reason for hiding this comment

joelegasse Jul 15, 2016

Choose a reason for hiding this comment

jwilder Jul 15, 2016

Choose a reason for hiding this comment

joelegasse commented Jul 15, 2016

corylanou Jul 20, 2016

Choose a reason for hiding this comment

jwilder Jul 20, 2016

Choose a reason for hiding this comment

jwilder commented Jul 14, 2016 •

edited

Loading