Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong barrier ts for partition table under frequent ddl scenario #10668

Closed
lidezhu opened this issue Feb 28, 2024 · 12 comments · Fixed by #10669
Closed

Wrong barrier ts for partition table under frequent ddl scenario #10668

lidezhu opened this issue Feb 28, 2024 · 12 comments · Fixed by #10669
Assignees
Labels
affects-6.5 affects-7.1 affects-7.5 area/ticdc Issues or PRs related to TiCDC. severity/major type/bug The issue is confirmed as a bug.

Comments

@lidezhu
Copy link
Collaborator

lidezhu commented Feb 28, 2024

What did you do?

Run CASE=partition_table make integration_test_storage.

What did you expect to see?

The test always succeed.

What did you see instead?

The test sometimes failed.

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

Upstream TiKV version (execute tikv-server --version):

(paste TiKV version here)

TiCDC version (execute cdc version):

(paste TiCDC version here)
@lidezhu lidezhu added area/ticdc Issues or PRs related to TiCDC. type/bug The issue is confirmed as a bug. labels Feb 28, 2024
@lidezhu
Copy link
Collaborator Author

lidezhu commented Feb 28, 2024

After some investigation, we find that the format of the storage files which are written by cdc are not legal.
[root@ldz-test partition_table]# tree storage_test/ticdc-partition-table-test-8611/partition_table/t1
storage_test/ticdc-partition-table-test-8611/partition_table/t1

......
├── 447992388515004420
│   ├── 143
│   │   └── 2024-02-26
│   │       ├── CDC00000000000000000001.json 1708955342790(es) -3 -4 
│   │       ├── CDC00000000000000000002.json 1708955339740 -1 -2
│   │       └── meta
│   │           └── CDC.index
│   ├── 144
│   │   └── 2024-02-26
│   │       ├── CDC00000000000000000001.json 1708955339740 6
│   │       ├── CDC00000000000000000002.json 1708955342790 5
│   │       └── meta
│   │           └── CDC.index
│   └── 145
│       └── 2024-02-26
│           ├── CDC00000000000000000001.json 1708955339740 13 20-DELETE
│           └── meta
│               └── CDC.index
├── 447992389170626566
│   ├── 147
......

As the above example shows, 447992388515004420/143/2024-02-26/CDC00000000000000000001.json contains two lines of data which are -3 and -4, but these two lines of data should actually reside in the directory 447992389170626566. And they have large commit ts than the data in 447992388515004420/143/2024-02-26/CDC00000000000000000002.json.
When storage-consumer read data from the path, after reading -3 and -4, it will ignore -1 and -2 which have smaller commit ts, and the test fails.

@lidezhu
Copy link
Collaborator Author

lidezhu commented Feb 28, 2024

The following is the test sqls related to the test failure.

-- ......

-- ddl1: reorganize partition
ALTER TABLE t1 REORGANIZE PARTITION p0,p2 INTO (PARTITION p0 VALUES LESS THAN (5), PARTITION p1 VALUES LESS THAN (10), PARTITION p2 VALUES LESS THAN (21));
insert into t1 values (-1),(6),(13);
update t1 set a=a-22 where a=20;
delete from t1 where a = 5;

-- logical table id: 207
-- partitions: 226-p0 227-p1 228-p2 223-p3 213-p4

-- ddl2: reorganize partition
ALTER TABLE t1 REORGANIZE PARTITION p2,p3,p4 INTO (PARTITION p2 VALUES LESS THAN (20), PARTITION p3 VALUES LESS THAN (26), PARTITION p4 VALUES LESS THAN (35), PARTITION pMax VALUES LESS THAN (MAXVALUE));
insert into t1 values (-3),(5),(14),(22),(30),(100);
update t1 set a=a-16 where a=12;
delete from t1 where a = 29;

-- logical table id: 207
-- partitions: 226-p0 227-p1 230-p2 231-p3 232-p4 233-pMax


-- ddl3: alter partition from by range to by hash
alter table t1 partition by key(a) partitions 7;
insert into t1 values (-2001),(2001),(2002),(-2002),(-2003),(2003),(-2004),(2004),(-2005),(2005),(2006),(-2006),(2007),(-2007);

-- logical table id: 242
-- partitions: 235-p0 236-p1 237-p2 238-p3 239-p4 240-p5 241-p6

The problem happens in the following steps:

  1. after ddl1 executed, partition p0 and p1 were created;
  2. ddl2 and ddl3 were considered as ddl related to different tables because they have different logical table ids, so they will all be considered when trying to generate barrier ts for different tables; (cdc/owner/ddl_manager.go#L431)
  3. partition p0 and p1 are in both the PreTableInfo of ddl2 and ddl3, so partition p0 and p1 will use the FinishTS of ddl3 as the barrier; (cdc/owner/ddl_manager.go#L459)
  4. So the dmls after ddl2 may be executed before the execution of ddl2;

@lidezhu
Copy link
Collaborator Author

lidezhu commented Feb 28, 2024

/label severity/major

Copy link
Contributor

ti-chi-bot bot commented Feb 28, 2024

@lidezhu: The label(s) severity/major cannot be applied. These labels are supported: duplicate, bug-from-internal-test, bug-from-user, ok-to-test, needs-ok-to-test, affects-5.4, affects-6.1, affects-6.5, affects-7.1, affects-7.5, affects-7.6, may-affects-5.4, may-affects-6.1, may-affects-6.5, may-affects-7.1, may-affects-7.5, may-affects-7.6, needs-cherry-pick-release-5.4, needs-cherry-pick-release-6.1, needs-cherry-pick-release-6.5, needs-cherry-pick-release-7.1, needs-cherry-pick-release-7.5, needs-cherry-pick-release-7.6, question, release-blocker, wontfix, MariaDB.

In response to this:

/label severity/major

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@lidezhu
Copy link
Collaborator Author

lidezhu commented Feb 28, 2024

/severity major

@lidezhu
Copy link
Collaborator Author

lidezhu commented Feb 28, 2024

/assign @lidezhu

@lidezhu
Copy link
Collaborator Author

lidezhu commented Feb 28, 2024

/label affects-7.5

@lidezhu
Copy link
Collaborator Author

lidezhu commented Feb 28, 2024

/label affects-7.1

@lidezhu
Copy link
Collaborator Author

lidezhu commented Feb 28, 2024

/label affects-6.5

@lidezhu
Copy link
Collaborator Author

lidezhu commented Feb 28, 2024

/remove-label may-affects-6.1

@lidezhu
Copy link
Collaborator Author

lidezhu commented Feb 28, 2024

/remove-label may-affects-5.4

Copy link
Contributor

ti-chi-bot bot commented Feb 28, 2024

@lidezhu: These labels are not set on the issue: may-affects-7.1.

In response to this:

/remove-label may-affects-7.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 affects-7.1 affects-7.5 area/ticdc Issues or PRs related to TiCDC. severity/major type/bug The issue is confirmed as a bug.
Projects
Development

Successfully merging a pull request may close this issue.

1 participant