[cherry-pick] restore: split & scatter regions concurrently(tidb#27034) #1429

YuJuncen · 2021-09-01T02:34:57Z

cherry-picking tidb#27034

===

Port of #1363

What problem does this PR solve?

Before, when restoring many small tables, the batcher would probably send small batch due to the so called AutoCommit feature of the batcher. By this, we can make the split & scatter & restore worker more active.

But frequently send small batches isn't free. The split step is costly and I/O bounded for even small batches. For example, it costs about 3s to splitting 60 ranges, but restore those ranges typically costs only 1s. Then the restore worker get idle at most time. The restore hence has slowed down.

What is changed and how it works?

Instead of using a single split worker, this PR allow multi restore batches be split concurrently.
We added two hidden flags, --batch-flush-interval and --pd-concurrency, the former for better tuning the behavior of batcher, the latter for tweaking the concurrent split.
Also, more logs were added so the create table speed, download, ingest time cost can be observed via log.

Check List

Tests

Integration test
Manual test (add detailed scripts or steps below)
A internal test shows, in a 190GB, 6000 tables workload, this PR can speed up the restoration: the original version takes over 2 hours for restoring, and this version takes about 30mins for restoring. The latter is nearly equal to the time cost of creating tables(see figure below).

Release note

Restore many small tables would be faster now.

ti-chi-bot · 2021-09-01T02:34:58Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

kennytm
zwj-coder

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

…cherry-pick-27034

YuJuncen · 2021-09-08T10:19:15Z

pkg/restore/pipeline_items.go:319:27: b.tableWaiters.LoadAndDelete undefined (type *sync.Map has no field or method LoadAndDelete)
OOPS.

kennytm · 2021-09-08T10:22:58Z

LoadAndDelete was added in Go 1.15 😿

YuJuncen · 2021-09-08T10:24:08Z

@@ -316,12 +316,13 @@ func (b *tikvSender) registerTableIsRestoring(ts []CreatedTable) func() {
 // till all tables provided are no more ‘current restoring’.
 func (b *tikvSender) waitTablesDone(ts []CreatedTable) {
        for _, t := range ts {
-               wg, ok := b.tableWaiters.LoadAndDelete(t.Table.ID)
+               wg, ok := b.tableWaiters.Load(t.Table.ID)
                if !ok {
                        log.Panic("bug! table done before register!",
                                zap.Any("wait-table-map", b.tableWaiters),
                                zap.Stringer("table", t.Table.Name))
                }
+               b.tableWaiters.Delete(t.Table.ID)
                wg.(*sync.WaitGroup).Wait()
        }

Well, anyway, @kennytm PTAL

YuJuncen · 2021-09-08T11:33:31Z

/merge

ti-chi-bot · 2021-09-08T11:33:34Z

This pull request has been accepted and is ready to merge.

Commit hash: 8612207

restore: split & scatter regions concurrently (#27034)

6242362

YuJuncen added the type/4.0-cherry-pick label Sep 1, 2021

ti-chi-bot added do-not-merge/cherry-pick-not-approved size/L labels Sep 1, 2021

ti-chi-bot requested review from 3pointer and lichunzhu September 1, 2021 02:35

zhouqiang-cl added the cherry-pick-approved label Sep 6, 2021

ti-chi-bot added needs-rebase and removed do-not-merge/cherry-pick-not-approved labels Sep 6, 2021

YuJuncen added 2 commits September 8, 2021 18:09

Merge branch 'release-4.0' of https://github.com/pingcap/br into 4.0-…

6b5e3ea

…cherry-pick-27034

restore: fix build

5d72c06

ti-chi-bot removed the needs-rebase label Sep 8, 2021

kennytm approved these changes Sep 8, 2021

View reviewed changes

ti-chi-bot added the status/LGT1 LGTM1 label Sep 8, 2021

restore: change LoadAndDelete to Load + Delete

8612207

kennytm approved these changes Sep 8, 2021

View reviewed changes

ZipFast approved these changes Sep 8, 2021

View reviewed changes

ti-chi-bot added status/LGT2 LGTM2 and removed status/LGT1 LGTM1 labels Sep 8, 2021

ti-chi-bot added the status/can-merge label Sep 8, 2021

ti-chi-bot merged commit 2067914 into pingcap:release-4.0 Sep 8, 2021

zhouqiang-cl added this to the v4.0.15 milestone Sep 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cherry-pick] restore: split & scatter regions concurrently(tidb#27034) #1429

[cherry-pick] restore: split & scatter regions concurrently(tidb#27034) #1429

YuJuncen commented Sep 1, 2021

ti-chi-bot commented Sep 1, 2021 •

edited

Loading

YuJuncen commented Sep 8, 2021

kennytm commented Sep 8, 2021 •

edited

Loading

YuJuncen commented Sep 8, 2021

YuJuncen commented Sep 8, 2021

ti-chi-bot commented Sep 8, 2021

[cherry-pick] restore: split & scatter regions concurrently(tidb#27034) #1429

[cherry-pick] restore: split & scatter regions concurrently(tidb#27034) #1429

Conversation

YuJuncen commented Sep 1, 2021

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

ti-chi-bot commented Sep 1, 2021 • edited Loading

YuJuncen commented Sep 8, 2021

kennytm commented Sep 8, 2021 • edited Loading

YuJuncen commented Sep 8, 2021

YuJuncen commented Sep 8, 2021

ti-chi-bot commented Sep 8, 2021

ti-chi-bot commented Sep 1, 2021 •

edited

Loading

kennytm commented Sep 8, 2021 •

edited

Loading