Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization: Unified Sorter #972

Merged
merged 91 commits into from
Nov 25, 2020
Merged

Conversation

liuzix
Copy link
Contributor

@liuzix liuzix commented Sep 27, 2020

What problem does this PR solve?

#945

  • Improves serialization performance.
  • Supports concurrent sorting.
  • Enables automatic fallback to disk when memory is not enough for sorting.

What is changed and how it works?

  • Implemented UnifiedSorter.
  • Made UnifiedSorter the default sorter.
  • Moved mounting after sorting, to ease serialization to temp files.

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change

Side effects

  • Possible performance regression
  • Increased code complexity

Benchmarks

Method

We generate mock KV-entries with 1KB data in each entry. We use 256 goroutines to generate data in parallel. There is no resolved event until all data has been fed to the sorter.

Results

Data Time
50GB 7 min
100GB 16 min
200GB 32 min

Machine specifications

CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Memory: 16GB DDR4 2133MHz * 8
Disk: Intel Corporation NVMe Datacenter SSD

Sorter configuration

{
	NumConcurrentWorker:  16,
	ChunkSizeLimit:       1 * 1024 * 1024 * 1024,   // 1GB
	MaxMemoryPressure:    60,
	MaxMemoryConsumption: 16 * 1024 * 1024 * 1024,  // 16GB
}

Release note

  • No release note

@liuzix liuzix added status/WIP subject/performance Denotes an issue or pull request is related to replication performance. component/puller Puller component. labels Sep 27, 2020
}
// Cache is full. Let GC do its job
}
panic("Unexpected type")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer log.Fatal.

for i := range heapSorters {
finalI := i
heapSorters[finalI] = newHeapSorter(finalI, s.pool, sorterOutCh)
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer errgroup.Go to manage related goroutines.

fileBufferSize = 16 * 1024 * 1024
heapSizeLimit = 4 * 1024 * 1024 // 4MB
numConcurrentHeaps = 16
memoryLimit = 1024 * 1024 * 1024 // 1GB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you set the value base on system memory capacity, eg. min(upperBound, x% * systemMemory).

memoryUseEstimate: 0,
fileNameCounter: 0,
mu: sync.Mutex{},
cache: make([]unsafe.Pointer, 256),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you want a fixed size array, eg, [256]unsafe.Pointer.

}

func newFileSorterBackEnd(fileName string, serde serializerDeserializer) (*fileSorterBackEnd, error) {
f, err := os.Create(fileName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you remove the file?

Comment on lines 402 to 403
heapSizeBytesEstimate += event.RawKV.ApproximateSize()
if heapSizeBytesEstimate >= heapSizeLimit || isResolvedEvent {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to calculate the estimated size before adding an entry to the heap to make sure it does not exceed the limit.

@liuzix liuzix added status/ptal Could you please take a look? and removed status/WIP labels Oct 10, 2020
@liuzix
Copy link
Contributor Author

liuzix commented Oct 13, 2020

/run-all-tests

@liuzix
Copy link
Contributor Author

liuzix commented Oct 13, 2020

/run-all-tests

@liuzix
Copy link
Contributor Author

liuzix commented Oct 13, 2020

/run-all-tests

@liuzix
Copy link
Contributor Author

liuzix commented Nov 24, 2020

/run-all-tests

@amyangfei
Copy link
Contributor

/run-integration-tests

@liuzix
Copy link
Contributor Author

liuzix commented Nov 25, 2020

/run-all-tests

@@ -0,0 +1,150 @@
// Copyright 2020 PingCAP, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please exclude testing_utils dir from the coverage counter

@amyangfei amyangfei added this to the v4.0.9 milestone Nov 25, 2020
@liuzix
Copy link
Contributor Author

liuzix commented Nov 25, 2020

/run-all-tests

Copy link
Contributor

@amyangfei amyangfei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

Makefile Outdated Show resolved Hide resolved
Co-authored-by: amyangfei <amyangfei@gmail.com>
Copy link
Contributor

@amyangfei amyangfei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Nov 25, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Nov 25, 2020
@amyangfei
Copy link
Contributor

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Nov 25, 2020
@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 1100
  • 1116

@ti-srebot
Copy link
Contributor

/run-all-tests

@zier-one zier-one added the needs-cherry-pick-release-4.0 Should cherry pick this PR to release-4.0 branch. label Nov 25, 2020
@ti-srebot ti-srebot merged commit ad1354b into pingcap:master Nov 25, 2020
ti-srebot pushed a commit to ti-srebot/ticdc that referenced this pull request Nov 25, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #1122

amyangfei pushed a commit that referenced this pull request Nov 26, 2020
* cherry pick #972 to release-4.0

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
Co-authored-by: Zixiong Liu <liuzixiong@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/puller Puller component. needs-cherry-pick-release-4.0 Should cherry pick this PR to release-4.0 branch. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. status/ptal Could you please take a look? subject/performance Denotes an issue or pull request is related to replication performance.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants