Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-36081][cdc-base][mysql] Fix that MySQL source connector missing some columns data of newly added tables #3548

Conversation

kevinwangcs
Copy link
Contributor

Problem Description:

When adding a new table, the Flink CDC MySQL source connector experiences missing data for some columns of the newly added table.

Reproduction Scenario:

  1. Remove a table from a cdc job that is running normally, then start the job with resume functionality.
  2. Perform a column addition operation on the removed table.
  3. Add the table back to the job. The job continues to run without interruption upon table addition, but data for the newly added columns is missing in the synchronized data.

Cause Analysis:

The issue arises because the MySQL CDC Source maintains the table schema in state. When adding a new table, it recovers the schema from the previous state. Since the prior schema exists and represents the structure before the column addition, the MySQL CDC Source provides the downstream with data based on the schema cached in the state. Consequently, records outputted to downstream systems are missing the fields corresponding to the newly added columns.

Proposed Solution:

Upon removing a table from the cdc job, it is necessary to also correspondingly remove the table from the MySQLBinlogSplit.

云时 added 2 commits August 20, 2024 12:00
Copy link
Contributor

@ruanhang1993 ruanhang1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ruanhang1993 ruanhang1993 merged commit 77c6338 into apache:master Aug 21, 2024
22 checks passed
yuxiqian pushed a commit to yuxiqian/flink-cdc that referenced this pull request Aug 22, 2024
…bles in the BinlogSplit when restart (apache#3548)

Co-authored-by: 云时 <mingya.wmy@alibaba-inc.com>
(cherry picked from commit 77c6338)
leonardBang pushed a commit to yuxiqian/flink-cdc that referenced this pull request Aug 27, 2024
…bles in the BinlogSplit when restart (apache#3548)

Co-authored-by: 云时 <mingya.wmy@alibaba-inc.com>
(cherry picked from commit 77c6338)
leonardBang pushed a commit that referenced this pull request Aug 27, 2024
…bles in the BinlogSplit when restart (#3548)

Co-authored-by: 云时 <mingya.wmy@alibaba-inc.com>
(cherry picked from commit 77c6338)
qiaozongmi pushed a commit to qiaozongmi/flink-cdc that referenced this pull request Sep 23, 2024
…bles in the BinlogSplit when restart (apache#3548)


Co-authored-by: 云时 <mingya.wmy@alibaba-inc.com>
ChaomingZhangCN pushed a commit to ChaomingZhangCN/flink-cdc that referenced this pull request Jan 13, 2025
…bles in the BinlogSplit when restart (apache#3548)


Co-authored-by: 云时 <mingya.wmy@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants