-
Notifications
You must be signed in to change notification settings - Fork 101
[cherry-pick] restore: split & scatter regions concurrently(tidb#27034) #1429
[cherry-pick] restore: split & scatter regions concurrently(tidb#27034) #1429
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
pkg/restore/pipeline_items.go:319:27: b.tableWaiters.LoadAndDelete undefined (type *sync.Map has no field or method LoadAndDelete) |
LoadAndDelete was added in Go 1.15 😿 |
@@ -316,12 +316,13 @@ func (b *tikvSender) registerTableIsRestoring(ts []CreatedTable) func() {
// till all tables provided are no more ‘current restoring’.
func (b *tikvSender) waitTablesDone(ts []CreatedTable) {
for _, t := range ts {
- wg, ok := b.tableWaiters.LoadAndDelete(t.Table.ID)
+ wg, ok := b.tableWaiters.Load(t.Table.ID)
if !ok {
log.Panic("bug! table done before register!",
zap.Any("wait-table-map", b.tableWaiters),
zap.Stringer("table", t.Table.Name))
}
+ b.tableWaiters.Delete(t.Table.ID)
wg.(*sync.WaitGroup).Wait()
} Well, anyway, @kennytm PTAL |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 8612207
|
cherry-picking tidb#27034
===
Port of #1363
What problem does this PR solve?
Before, when restoring many small tables, the batcher would probably send small batch due to the so called
AutoCommit
feature of the batcher. By this, we can make the split & scatter & restore worker more active.But frequently send small batches isn't free. The split step is costly and I/O bounded for even small batches. For example, it costs about 3s to splitting 60 ranges, but restore those ranges typically costs only 1s. Then the restore worker get idle at most time. The restore hence has slowed down.
What is changed and how it works?
Instead of using a single split worker, this PR allow multi restore batches be split concurrently.
We added two hidden flags,
--batch-flush-interval
and--pd-concurrency
, the former for better tuning the behavior of batcher, the latter for tweaking the concurrent split.Also, more logs were added so the create table speed, download, ingest time cost can be observed via log.
Check List
Tests
A internal test shows, in a 190GB, 6000 tables workload, this PR can speed up the restoration: the original version takes over 2 hours for restoring, and this version takes about 30mins for restoring. The latter is nearly equal to the time cost of creating tables(see figure below).
Release note