Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential data inconsistency under heavy ddl operation #5044

Merged
merged 3 commits into from
Jun 2, 2022

Conversation

lidezhu
Copy link
Contributor

@lidezhu lidezhu commented Jun 2, 2022

What problem does this PR solve?

Issue Number: close #5032

Problem Summary: Currently we cache decoding schema for decoding raft data if a table schema doesn't change. And we judge whether a table schema has changed based on the table schema version.

But the schema version is not strictly consistent with the actual table schema which can be seen at https://github.com/pingcap/tiflash/blob/master/dbms/src/TiDB/Schema/SchemaBuilder.cpp#L362. That is when applying different schema changes in a diff, the table schema version will be set to the latest schema version after the first schema change is applied.

More concretely, when a lossy ddl change occurs, it will trigger drop column and add column schema changes and also rewrite some data at the same time. 
The schema changes will be applied one by one, and because tiflash updates the table schema version ahead of time when applying schema diff, after applying the drop column schema change, it will update the schema version to the latest schema version.

And if the decode thread tries to obtain the current schema for decoding data before the add column is applied, the current schema and the latest schema version will be cached. Then after the subsequent add column operation is applied, the table schema version will not be updated, so the cache of the decode thread will not be invalidated.

Therefore, the decode thread will decode the new data with an older schema, considering that the new added column is a dropped column and discarding it.

In addition, there is a lower chance of triggering this problem in the case of frequent add column and insert data.

What is changed and how it works?

Add an internal schema version for DecodingStorageSchemaSnapshot.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Fix potential data inconsistency under heavy ddl operation

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jun 2, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • JaySon-Huang
  • flowbehappy

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jun 2, 2022
@lidezhu lidezhu changed the title fix decoding error under heavy ddl operation Fix decoding error under heavy ddl operation Jun 2, 2022
@lidezhu lidezhu changed the title Fix decoding error under heavy ddl operation Fix potential decoding error under heavy ddl operation Jun 2, 2022
@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 2, 2022

/run-all-tests

@lidezhu lidezhu force-pushed the fix-decode-under-heavy-ddl branch from cce601f to e67c854 Compare June 2, 2022 02:12
@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 2, 2022

/run-all-tests

@lidezhu lidezhu force-pushed the fix-decode-under-heavy-ddl branch from e67c854 to 07356bc Compare June 2, 2022 02:35
@ti-chi-bot ti-chi-bot added needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. needs-cherry-pick-release-6.0 Type: Need cherry pick to release-6.0 needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. and removed do-not-merge/needs-triage-completed labels Jun 2, 2022
@lidezhu lidezhu changed the title Fix potential decoding error under heavy ddl operation Fix potential data inconsistency under heavy ddl operation Jun 2, 2022
@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 2, 2022

/run-all-tests

@lidezhu lidezhu force-pushed the fix-decode-under-heavy-ddl branch from 1cd0213 to 1bfc7b7 Compare June 2, 2022 03:29
@sre-bot
Copy link
Collaborator

sre-bot commented Jun 2, 2022

Coverage for changed files

Filename                                                 Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/FailPoint.cpp                                         464                54    88.36%           6                 0   100.00%          56                 4    92.86%         154                54    64.94%
Debug/dbgFuncSchema.cpp                                       50                50     0.00%           5                 5     0.00%          77                77     0.00%          30                30     0.00%
Storages/IManageableStorage.h                                 20                18    10.00%          20                18    10.00%          38                36     5.26%           0                 0         -
Storages/StorageDeltaMerge.cpp                               679               328    51.69%          58                26    55.17%        1307               725    44.53%         378               243    35.71%
Storages/StorageDeltaMerge.h                                  11                 6    45.45%          11                 6    45.45%          17                 8    52.94%           0                 0         -
Storages/Transaction/DecodingStorageSchemaSnapshot.h          35                 1    97.14%           1                 0   100.00%          61                 1    98.36%          26                 2    92.31%
Storages/Transaction/PartitionStreams.cpp                    252               185    26.59%          17                 8    52.94%         512               310    39.45%         134               108    19.40%
Storages/Transaction/tests/RowCodecTestUtils.h                80                 4    95.00%          14                 0   100.00%         168                 1    99.40%          30                 2    93.33%
TiDB/Schema/SchemaBuilder.cpp                                846               805     4.85%          47                43     8.51%        1065               993     6.76%         492               472     4.07%
TiDB/Schema/TiDBSchemaSyncer.h                               140               132     5.71%          13                 9    30.77%         125               100    20.00%          52                51     1.92%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                       2577              1583    38.57%         192               115    40.10%        3426              2255    34.18%        1296               962    25.77%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18282      9734             46.76%    204983  97619        52.38%

full coverage report (for internal network access only)

@lidezhu lidezhu force-pushed the fix-decode-under-heavy-ddl branch from 1bfc7b7 to b11d58b Compare June 2, 2022 03:42
Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 2, 2022
@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 2, 2022

/run-all-tests

@sre-bot
Copy link
Collaborator

sre-bot commented Jun 2, 2022

Coverage for changed files

Filename                                                 Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/FailPoint.cpp                                         464                54    88.36%           6                 0   100.00%          56                 4    92.86%         154                54    64.94%
Debug/dbgFuncSchema.cpp                                       50                50     0.00%           5                 5     0.00%          77                77     0.00%          30                30     0.00%
Storages/IManageableStorage.h                                 20                18    10.00%          20                18    10.00%          38                36     5.26%           0                 0         -
Storages/StorageDeltaMerge.cpp                               679               328    51.69%          58                26    55.17%        1307               725    44.53%         378               243    35.71%
Storages/StorageDeltaMerge.h                                  11                 6    45.45%          11                 6    45.45%          17                 8    52.94%           0                 0         -
Storages/Transaction/DecodingStorageSchemaSnapshot.h          35                 1    97.14%           1                 0   100.00%          61                 1    98.36%          26                 2    92.31%
Storages/Transaction/PartitionStreams.cpp                    252               185    26.59%          17                 8    52.94%         512               310    39.45%         134               108    19.40%
Storages/Transaction/tests/RowCodecTestUtils.h                80                 4    95.00%          14                 0   100.00%         168                 1    99.40%          30                 2    93.33%
TiDB/Schema/SchemaBuilder.cpp                                846               805     4.85%          47                43     8.51%        1065               993     6.76%         492               472     4.07%
TiDB/Schema/TiDBSchemaSyncer.h                               140               132     5.71%          13                 9    30.77%         125               100    20.00%          52                51     1.92%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                       2577              1583    38.57%         192               115    40.10%        3426              2255    34.18%        1296               962    25.77%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18282      9734             46.76%    204983  97604        52.38%

full coverage report (for internal network access only)

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 2, 2022
@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 2, 2022

/merge

@ti-chi-bot
Copy link
Member

@lidezhu: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: ee5ccce

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jun 2, 2022
@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 2, 2022

/run-all-tests

@sre-bot
Copy link
Collaborator

sre-bot commented Jun 2, 2022

Coverage for changed files

Filename                                                 Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/FailPoint.cpp                                         464                54    88.36%           6                 0   100.00%          56                 4    92.86%         154                54    64.94%
Debug/dbgFuncSchema.cpp                                       50                50     0.00%           5                 5     0.00%          77                77     0.00%          30                30     0.00%
Storages/IManageableStorage.h                                 20                18    10.00%          20                18    10.00%          38                36     5.26%           0                 0         -
Storages/StorageDeltaMerge.cpp                               679               328    51.69%          58                26    55.17%        1307               725    44.53%         378               243    35.71%
Storages/StorageDeltaMerge.h                                  11                 6    45.45%          11                 6    45.45%          17                 8    52.94%           0                 0         -
Storages/Transaction/DecodingStorageSchemaSnapshot.h          35                 1    97.14%           1                 0   100.00%          61                 1    98.36%          26                 2    92.31%
Storages/Transaction/PartitionStreams.cpp                    252               185    26.59%          17                 8    52.94%         512               310    39.45%         134               108    19.40%
Storages/Transaction/tests/RowCodecTestUtils.h                80                 4    95.00%          14                 0   100.00%         168                 1    99.40%          30                 2    93.33%
TiDB/Schema/SchemaBuilder.cpp                                846               805     4.85%          47                43     8.51%        1065               993     6.76%         492               472     4.07%
TiDB/Schema/TiDBSchemaSyncer.h                               140               132     5.71%          13                 9    30.77%         125               100    20.00%          52                51     1.92%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                       2577              1583    38.57%         192               115    40.10%        3426              2255    34.18%        1296               962    25.77%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18282      9734             46.76%    204983  97622        52.38%

full coverage report (for internal network access only)

@lidezhu lidezhu merged commit 2ce9529 into pingcap:master Jun 2, 2022
@lidezhu lidezhu deleted the fix-decode-under-heavy-ddl branch June 2, 2022 05:13
ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jun 2, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5050.

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jun 2, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5051.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: failed to apply #5044 on top of branch "release-6.1":

failed to git commit: exit status 1

@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 2, 2022

In response to a cherrypick label: failed to apply #5044 on top of branch "release-6.1":

failed to git commit: exit status 1

release-6.1 has been manually cherry-picked in this pr: #5046

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. needs-cherry-pick-release-6.0 Type: Need cherry pick to release-6.0 needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data is inconsistent between TiFlash and TiKV after changing column types
5 participants