Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: sync schema version using watch, notify sessions on owner node by job id #53217

Merged
merged 14 commits into from
May 17, 2024

Conversation

D3Hunter
Copy link
Contributor

@D3Hunter D3Hunter commented May 13, 2024

What problem does this PR solve?

Issue Number: ref #53246

Problem Summary:

What changed and how does it work?

  • sync schema version using etcd watch
  • notify sessions on owner by job id to avoid triggering a wrong session when we run DDLs using multiple connections.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

1 pd/3 tikv, 1 tidb. Run on owner node. 20-30% faster

create table

thread=1, 500 tables
1m3.675s(127ms per table) -> 47.732s(95ms per table)

Thread=4, 2000 tables
3m8.39s(94ms per table) -> 2m20.273s(70ms per table)

Thread=16, 2000 tables
2m49.304s(85ms per table) -> 2m42.364s(81ms per table)

create database

thread=1, 500 databases
1m0.177s(120ms per op) -> 47.345s(95ms per op)

Thread=4, 2000 databases
3m35.87s(108ms per op) -> 2m29.758s(75ms per op)

1 pd/3 tikv, 8 tidb. Run on owner node, create table. 42% faster

thread=1, 500 tables
1m6s(132ms per table) -> 37.999s(76ms per table)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Copy link

ti-chi-bot bot commented May 13, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 13, 2024
Copy link

tiprow bot commented May 13, 2024

Hi @D3Hunter. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 13, 2024
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked labels May 14, 2024
@D3Hunter
Copy link
Contributor Author

/test all

Copy link

tiprow bot commented May 14, 2024

@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test all

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter D3Hunter changed the title ddl: optimize schema version sync ddl: sync schema version using watch, notify sessions on owner node by job id May 14, 2024
@D3Hunter D3Hunter mentioned this pull request May 14, 2024
18 tasks
@D3Hunter
Copy link
Contributor Author

/test all

Copy link

tiprow bot commented May 14, 2024

@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test all

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Contributor Author

/test all

Copy link

tiprow bot commented May 14, 2024

@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test all

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented May 14, 2024

@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

codecov bot commented May 14, 2024

Codecov Report

Attention: Patch coverage is 87.33032% with 28 lines in your changes are missing coverage. Please review.

Project coverage is 73.9039%. Comparing base (43fd1b2) to head (738f11c).
Report is 60 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #53217        +/-   ##
================================================
+ Coverage   72.4365%   73.9039%   +1.4674%     
================================================
  Files          1492       1524        +32     
  Lines        428995     445635     +16640     
================================================
+ Hits         310749     329342     +18593     
+ Misses        98991      96177      -2814     
- Partials      19255      20116       +861     
Flag Coverage Δ
integration 49.1316% <68.3257%> (?)
unit 71.3396% <80.5429%> (+0.0069%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9957% <ø> (ø)
parser ∅ <ø> (∅)
br 42.0474% <ø> (+0.6239%) ⬆️

@D3Hunter D3Hunter marked this pull request as ready for review May 14, 2024 07:28
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 14, 2024
@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 15, 2024
@D3Hunter D3Hunter requested a review from lance6716 May 16, 2024 07:26
Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm except for MDL part, I need more time to learn MDL behaviour 😂

@@ -126,16 +128,98 @@ type SchemaSyncer interface {
// the latest schema version. (exclude the isolated TiDB)
// It returns until all servers' versions are equal to the latest version.
OwnerCheckAllVersions(ctx context.Context, jobID int64, latestVer int64) error
// SyncJobSchemaVerLoop syncs the schema versions on all TiDB nodes for DDL jobs.
SyncJobSchemaVerLoop(ctx context.Context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move it inside NewSchemaSyncer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's part of DDL, start background routines in ddl.Start should be more clear.

v.Lock()
defer v.Unlock()
if ok := fn(v.nodeVersions); !ok {
// onceMatchFn must be nil before.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move it to function comments, so when I hover the cursor above the function in IDE I can see the comments.

s.handleJobSchemaVerKV(oneKV, mvccpb.PUT)
}
s.mu.Lock()
// we might miss some DELETE events during retry, some items might be emptyAndNotUsed, remove them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every resp of line 498 is a snapshot, we can always remove all the old data and apply the resp, no need to have two removing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/ddl/ddl.go Outdated
@@ -1205,13 +1233,18 @@ func (d *ddl) DoDDLJob(ctx sessionctx.Context, job *model.Job) error {
recordLastDDLInfo(ctx, historyJob)
}()
i := 0
notifyCh, ok := d.getJobDoneCh(job.ID)
if !ok {
// shouldn't happen, just give it a dummy one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

ti-chi-bot bot commented May 17, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lance6716, wjhuang2016

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 17, 2024
Copy link

ti-chi-bot bot commented May 17, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-05-15 06:03:11.146044733 +0000 UTC m=+1633144.903180299: ☑️ agreed by wjhuang2016.
  • 2024-05-17 06:11:24.18283088 +0000 UTC m=+1806437.939966473: ☑️ agreed by lance6716.

@ti-chi-bot ti-chi-bot bot merged commit 5d990c6 into pingcap:master May 17, 2024
21 of 23 checks passed
@D3Hunter D3Hunter deleted the opt-sync-ver branch May 17, 2024 06:38
terry1purcell pushed a commit to terry1purcell/tidb that referenced this pull request May 17, 2024
RidRisR pushed a commit to RidRisR/tidb that referenced this pull request May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants