drop index after add index finishing on primary cluster, drop index was retried repeatedly by ticdc due to add index was running on secondary and after a time, changefeed status was failed #10682

Lily2025 · 2024-02-29T11:30:41Z

What did you do?

1、restore data for primary and secondary
2、create changefeed and set bdr role for primary and secondary
3、run sysbench on primary and secondary
4、add index and then drop index when add index finished on primary
5、inject network partition between one of tikv and all other pods, which last for 3mins and recover
chaos recover time : 2024-02-29 17:29:24

ticdc logs:
endless-ha-test-ticdc-tps-7080582-1-976-tc-ticdc-0.tar.gz

What did you expect to see?

changefeed status are normal
after recover fault, ddl can sync success and changefeed lag can recover to normal

What did you see instead?

1、drop index was was retried repeatedly even if drop index was queueing on secondary due to add index was running
primary：

secondary：

2、after a time，changefeed status was failed
changefeed status：

Versions of the cluster

./cdc version
Release Version: v8.0.0-alpha
Git Commit Hash: 25ce29c
Git Branch: heads/refs/tags/v8.0.0-alpha
UTC Build Time: 2024-02-27 11:37:29
Go Version: go version go1.21.6 linux/amd64
Failpoint Build: false

current status of DM cluster (execute `query-status <task-name>` in dmctl)

No response

The text was updated successfully, but these errors were encountered:

Lily2025 · 2024-02-29T11:30:50Z

/remove-area dm
/area ticdc

Lily2025 · 2024-02-29T11:31:26Z

/assign sdojjy

sdojjy · 2024-03-01T04:44:04Z

From the ticdc view, it's not a bug.

In this case, we found that the add index ddl is hang on TiDB side, and the drop index ddl must wait for the add index ddl.
TiCDC set readTimeout to 2m by default, MySQL connection report invalid connection after wait for 2m, then changefeed failed after backoff 20 times.

[mysql] 2024/03/01 12:23:11 packets.go:37: read tcp 192.168.13.151:54646->10.2.12.46:30386: i/o timeout

user can increase the read timeout value in changefeed config to workaround this issue

[sink.mysql-config]
read-timeout=2m

Lily2025 · 2024-04-08T11:44:15Z

this issue happens in this scenario：
1、downstream is adding index and will take a long time to compete
2、upstream drop index and sync to downstream，drop index was retried repeatedly on downstream
3、changefeed may fail after retry 20 times

fubinzh · 2024-04-09T09:44:39Z

This might reuslt in changefeed failure, if add index followed by another DDL (not only drop index) executed, and the add index not able to finish in downstream with 20 retries.
/severity moderate

kennytm · 2024-06-04T10:42:52Z

I suggest that, after a DDL retry has failed because of timeout, we check whether there is any running DDL jobs in downstream. If true, instead of scheduling the 3rd retry we poll until the DDL job queue is empty.

Polling after a retry failure allows us to proceed after the "successful" retry of ADD INDEX, while prevent the DROP INDEX from retrying 20 times.

This should be considered an optimization and does not necessary work in all cases, including:

when ADMIN SHOW DDL; cannot be executed (e.g. permission issue)
when downstream is not TiDB (i.e. MySQL / MariaDB, unless they also have an equivalent of ADMIN SHOW DDL)

close pingcap#10682

close #10682

close pingcap#10682

close #10682

Lily2025 added area/dm Issues or PRs related to DM. type/bug The issue is confirmed as a bug. labels Feb 29, 2024

ti-chi-bot bot added area/ticdc Issues or PRs related to TiCDC. and removed area/dm Issues or PRs related to DM. labels Feb 29, 2024

ti-chi-bot bot assigned sdojjy Feb 29, 2024

ti-chi-bot bot added the severity/moderate label Apr 9, 2024

ti-chi-bot added affects-8.1 affects-7.5 affects-7.1 affects-6.5 labels Aug 6, 2024

CharlesCheung96 assigned CharlesCheung96 and unassigned sdojjy Aug 8, 2024

CharlesCheung96 mentioned this issue Aug 8, 2024

sink, ddl(ticdc): support querying add index ddl in downstream #11476

Merged

ti-chi-bot bot closed this as completed in #11476 Aug 12, 2024

ti-chi-bot bot closed this as completed in 1e3766e Aug 12, 2024

This was referenced Aug 12, 2024

sink, ddl(ticdc): support add index ddl in downstream (#11476) #11478

Merged

sink, ddl(ticdc): support add index ddl in downstream (#11476) #11479

Merged

sink, ddl(ticdc): support add index ddl in downstream (#11476) #11480

Merged

ti-chi-bot mentioned this issue Aug 12, 2024

sink, ddl(ticdc): support add index ddl in downstream (#11476) #11481

Merged

CharlesCheung96 added a commit to ti-chi-bot/tiflow that referenced this issue Aug 13, 2024

sink, ddl(ticdc): support add index ddl in downstream (pingcap#11476)

de16281

close pingcap#10682

CharlesCheung96 added a commit to ti-chi-bot/tiflow that referenced this issue Aug 13, 2024

sink, ddl(ticdc): support add index ddl in downstream (pingcap#11476)

f26d336

close pingcap#10682

ti-chi-bot bot pushed a commit that referenced this issue Aug 13, 2024

sink, ddl(ticdc): support add index ddl in downstream (#11476) (#11479)

785d41f

close #10682

ti-chi-bot bot pushed a commit that referenced this issue Aug 13, 2024

sink, ddl(ticdc): support add index ddl in downstream (#11476) (#11480)

2f9aad8

close #10682

CharlesCheung96 added a commit to ti-chi-bot/tiflow that referenced this issue Aug 13, 2024

sink, ddl(ticdc): support add index ddl in downstream (pingcap#11476)

a634844

close pingcap#10682

CharlesCheung96 added a commit to ti-chi-bot/tiflow that referenced this issue Aug 13, 2024

sink, ddl(ticdc): support add index ddl in downstream (pingcap#11476)

ee72daf

close pingcap#10682

ti-chi-bot bot pushed a commit that referenced this issue Aug 14, 2024

sink, ddl(ticdc): support add index ddl in downstream (#11476) (#11478)

b9af8f5

close #10682

ti-chi-bot bot pushed a commit that referenced this issue Aug 27, 2024

sink, ddl(ticdc): support add index ddl in downstream (#11476) (#11481)

b0abfd3

close #10682

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drop index after add index finishing on primary cluster, drop index was retried repeatedly by ticdc due to add index was running on secondary and after a time, changefeed status was failed #10682

drop index after add index finishing on primary cluster, drop index was retried repeatedly by ticdc due to add index was running on secondary and after a time, changefeed status was failed #10682

Lily2025 commented Feb 29, 2024 •

edited

Loading

Lily2025 commented Feb 29, 2024

Lily2025 commented Feb 29, 2024

sdojjy commented Mar 1, 2024

Lily2025 commented Apr 8, 2024

fubinzh commented Apr 9, 2024

kennytm commented Jun 4, 2024 •

edited

Loading

drop index after add index finishing on primary cluster, drop index was retried repeatedly by ticdc due to add index was running on secondary and after a time, changefeed status was failed #10682

drop index after add index finishing on primary cluster, drop index was retried repeatedly by ticdc due to add index was running on secondary and after a time, changefeed status was failed #10682

Comments

Lily2025 commented Feb 29, 2024 • edited Loading

What did you do?

What did you expect to see?

What did you see instead?

Versions of the cluster

current status of DM cluster (execute query-status <task-name> in dmctl)

Lily2025 commented Feb 29, 2024

Lily2025 commented Feb 29, 2024

sdojjy commented Mar 1, 2024

Lily2025 commented Apr 8, 2024

fubinzh commented Apr 9, 2024

kennytm commented Jun 4, 2024 • edited Loading

Lily2025 commented Feb 29, 2024 •

edited

Loading

current status of DM cluster (execute `query-status <task-name>` in dmctl)

kennytm commented Jun 4, 2024 •

edited

Loading