Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning: add store write limiter #35193

Merged
merged 10 commits into from
Jun 21, 2022
Merged

Conversation

sleepymole
Copy link
Contributor

@sleepymole sleepymole commented Jun 6, 2022

What problem does this PR solve?

Issue Number: close #35192

Problem Summary:

We want to use lightning to import data to an online cluster. To reduce the impact on online services, lightning needs to limit the write rate to the cluster.

What is changed and how it works?

Add a new config tikv-importer.store-write-bwlimit and limit the write bytes to TiKV nodes.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

lightning: add store write limiter

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jun 6, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • D3Hunter
  • lichunzhu

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 6, 2022
@sleepymole sleepymole added the component/lightning This issue is related to Lightning of TiDB. label Jun 6, 2022
@sre-bot
Copy link
Contributor

sre-bot commented Jun 6, 2022

br/pkg/lightning/backend/local/local.go Outdated Show resolved Hide resolved
br/tidb-lightning.toml Outdated Show resolved Hide resolved
flushKVs := func() error {
for i := range clients {
if local.writeLimiter != nil {
if err := local.writeLimiter.WaitN(ctx, storeIDs[i], int(size)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the rate limiter controls the writing limit for each TiKV storage nodes. Since the burst is just 20% more of the limit, if the sending size is larger than the and the limit / burst size, I'm afraid the actual write rate is smaller than the setting value ?

For example, suppose rate limiter is 1000, burst is 1200. If the sending size is, say 3000, and there are 3 nodes. For the first storage's rate limiter, it takes around 3 seconds to prepare the 3000 B storage data, and the rate is 1000B/s , which is OK. However, when iterating on the second storage node, it takes another 3 seconds to prepare the second storage data, and at that time the data has not sent actually, and so does the third storage node. At that time where preparation finishes, 3+3+3 = 9 seconds has elapsed. But for each node, only 3000 bytes are sent. So the actual writing rate for each node is the sending size / total prepare time, which is 3000 / 9.

Maybe those several rate limiter could be waiting in parallel ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your analysis is correct. There is a flushLimit, which is not greater than the limit. So the WaitN size is usually less than the burst.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my test, the size is about 3M. In practice, there is no need to set such a small limit. Because to achieve this speed, we can use tidb-backend instead.

Copy link
Contributor Author

@sleepymole sleepymole Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At that time where preparation finishes, 3+3+3 = 9 seconds has elapsed. But for each node, only 3000 bytes are sent. So the actual writing rate for each node is the sending size / total prepare time, which is 3000 / 9.

I thought of it again. I think this is as expected. Because kvs are sent to tikv node one by one for each region. The total send size is 9000, we need 9 seconds to send these data. It's unnecessary to wait for the rate limiter in parallel unless we actually send data to tikv in parallel.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 13, 2022
Copy link
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

burst = limit + limit/5
} else {
// If overflowed, set burst to math.MaxInt.
burst = math.MaxInt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe adjust them during config.adjust, and add a log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal implementation. burst is not configurable.

br/pkg/lightning/backend/local/local.go Outdated Show resolved Hide resolved
br/pkg/lightning/backend/local/local.go Outdated Show resolved Hide resolved
// the speed of write more smooth.
flushLimit := int64(math.MaxInt64)
if local.writeLimiter != nil {
flushLimit = int64(local.writeLimiter.limit / 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use 1/10?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply set a value that is smaller than the limit value. The call interval of the limiter will not exceed 100ms.

@@ -532,6 +532,7 @@ type TikvImporter struct {

EngineMemCacheSize ByteSize `toml:"engine-mem-cache-size" json:"engine-mem-cache-size"`
LocalWriterMemCacheSize ByteSize `toml:"local-writer-mem-cache-size" json:"local-writer-mem-cache-size"`
StoreWriteBWLimit ByteSize `toml:"store-write-bwlimit" json:"store-write-bwlimit"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's bw? batch write?

maybe max-store-write-rate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunzhaoyang ptal too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bw means bandwitch. It is similar to rysnc --bwlimit.

br/pkg/lightning/backend/local/local.go Outdated Show resolved Hide resolved
@sleepymole
Copy link
Contributor Author

/run-integration-br-test

@ti-chi-bot ti-chi-bot added status/LGT1 Indicates that a PR has LGTM 1. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 16, 2022
@sleepymole sleepymole requested a review from dsdashun June 20, 2022 01:52
Copy link
Contributor

@dsdashun dsdashun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

br/tidb-lightning.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@dsdashun dsdashun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link
Member

@dsdashun: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments.

In response to this:

LGTM

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@sleepymole
Copy link
Contributor Author

/run-integration-br-test

Copy link
Contributor

@lichunzhu lichunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 21, 2022
@sleepymole
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 8ec642b

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jun 21, 2022
@sleepymole
Copy link
Contributor Author

/run-unit-test

@hawkingrei
Copy link
Member

/run-mysql-test

@ti-chi-bot ti-chi-bot merged commit 2648989 into pingcap:master Jun 21, 2022
@sleepymole sleepymole deleted the online-import-2 branch June 21, 2022 05:19
@sre-bot
Copy link
Contributor

sre-bot commented Jun 21, 2022

TiDB MergeCI notify

✅ Well Done! New fixed [1] after this pr merged.

CI Name Result Duration Compare with Parent commit
idc-jenkins-ci-tidb/integration-common-test 🔴 failed 1, success 10, total 11 22 min Existing failure
idc-jenkins-ci-tidb/common-test ✅ all 12 tests passed 8 min 46 sec Fixed
idc-jenkins-ci/integration-cdc-test 🟢 all 34 tests passed 26 min Existing passed
idc-jenkins-ci-tidb/integration-ddl-test 🟢 all 6 tests passed 6 min 16 sec Existing passed
idc-jenkins-ci-tidb/sqllogic-test-1 🟢 all 26 tests passed 6 min 3 sec Existing passed
idc-jenkins-ci-tidb/sqllogic-test-2 🟢 all 28 tests passed 5 min 36 sec Existing passed
idc-jenkins-ci-tidb/tics-test 🟢 all 1 tests passed 5 min 7 sec Existing passed
idc-jenkins-ci-tidb/mybatis-test 🟢 all 1 tests passed 3 min 10 sec Existing passed
idc-jenkins-ci-tidb/integration-compatibility-test 🟢 all 1 tests passed 2 min 42 sec Existing passed
idc-jenkins-ci-tidb/plugin-test 🟢 build success, plugin test success 4min Existing passed

morgo added a commit to morgo/tidb that referenced this pull request Jun 28, 2022
* upstream/master:
  sessionctx: support encoding and decoding session variables (pingcap#35531)
  planner: add batch_point_get access object (pingcap#35230)
  sessionctx: set skipInit false for TiDBOptProjectionPushDown and TiDBOptAggPushDown (pingcap#35491)
  *: add support for disabling noop variables (pingcap#35496)
  lightning: add store write limiter (pingcap#35193)
  expression: avoid padding 0 when implicitly cast to binary (pingcap#35053)
  types: fix creating partition tables fail in ANSI_QUOTES mode (pingcap#35379)
  executor: add the missed runtime stats when the index merge partial task returns 0 row (pingcap#35297)
  statistics: batch insert topn and bucket when saving table stats (pingcap#35326)
  parser: Add support for INTERVAL expr unit + expr (pingcap#30253) (pingcap#35390)
  config: add missing nodelay example (pingcap#35255)
  *: Introduce `OptimisticTxnContextProvider` for optimistic txn (pingcap#35131)
  statistics: fix panic when using wrong globalStats.Indices key (pingcap#35424)
  *: fix store token is up to the limit in test (pingcap#35374)
  *: enable more flaky and update bazel config (pingcap#35500)
  ddl: expose getPrimaryKey() as a public method of model.TableInfo (pingcap#35512)
  expression, util: add `KeyWithoutTrimRightSpace` for collator (pingcap#35475)
  infoschema: try on each PD member until one succeeds or all fail (pingcap#35285)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/lightning This issue is related to Lightning of TiDB. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lightning: add store write limiter
7 participants