Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

[cherry-pick] restore: split & scatter regions concurrently(tidb#27034) #1429

Merged
merged 4 commits into from
Sep 8, 2021

Conversation

YuJuncen
Copy link
Collaborator

@YuJuncen YuJuncen commented Sep 1, 2021

cherry-picking tidb#27034

===

Port of #1363

What problem does this PR solve?

Before, when restoring many small tables, the batcher would probably send small batch due to the so called AutoCommit feature of the batcher. By this, we can make the split & scatter & restore worker more active.

But frequently send small batches isn't free. The split step is costly and I/O bounded for even small batches. For example, it costs about 3s to splitting 60 ranges, but restore those ranges typically costs only 1s. Then the restore worker get idle at most time. The restore hence has slowed down.

BR pipeline-2

What is changed and how it works?

Instead of using a single split worker, this PR allow multi restore batches be split concurrently.
We added two hidden flags, --batch-flush-interval and --pd-concurrency, the former for better tuning the behavior of batcher, the latter for tweaking the concurrent split.
Also, more logs were added so the create table speed, download, ingest time cost can be observed via log.

Check List

Tests

  • Integration test
  • Manual test (add detailed scripts or steps below)
    A internal test shows, in a 190GB, 6000 tables workload, this PR can speed up the restoration: the original version takes over 2 hours for restoring, and this version takes about 30mins for restoring. The latter is nearly equal to the time cost of creating tables(see figure below).

image

Release note

Restore many small tables would be faster now.

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Sep 1, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • kennytm
  • zwj-coder

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added the status/LGT1 LGTM1 label Sep 8, 2021
@YuJuncen
Copy link
Collaborator Author

YuJuncen commented Sep 8, 2021

pkg/restore/pipeline_items.go:319:27: b.tableWaiters.LoadAndDelete undefined (type *sync.Map has no field or method LoadAndDelete)
OOPS.

@kennytm
Copy link
Collaborator

kennytm commented Sep 8, 2021

LoadAndDelete was added in Go 1.15 😿

@YuJuncen
Copy link
Collaborator Author

YuJuncen commented Sep 8, 2021

@@ -316,12 +316,13 @@ func (b *tikvSender) registerTableIsRestoring(ts []CreatedTable) func() {
 // till all tables provided are no more ‘current restoring’.
 func (b *tikvSender) waitTablesDone(ts []CreatedTable) {
        for _, t := range ts {
-               wg, ok := b.tableWaiters.LoadAndDelete(t.Table.ID)
+               wg, ok := b.tableWaiters.Load(t.Table.ID)
                if !ok {
                        log.Panic("bug! table done before register!",
                                zap.Any("wait-table-map", b.tableWaiters),
                                zap.Stringer("table", t.Table.Name))
                }
+               b.tableWaiters.Delete(t.Table.ID)
                wg.(*sync.WaitGroup).Wait()
        }

Well, anyway, @kennytm PTAL

@ti-chi-bot ti-chi-bot added status/LGT2 LGTM2 and removed status/LGT1 LGTM1 labels Sep 8, 2021
@YuJuncen
Copy link
Collaborator Author

YuJuncen commented Sep 8, 2021

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 8612207

@ti-chi-bot ti-chi-bot merged commit 2067914 into pingcap:release-4.0 Sep 8, 2021
@zhouqiang-cl zhouqiang-cl added this to the v4.0.15 milestone Sep 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants