Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2DC] CDC state checkpoint gets reset to 0 in case of leader change #2897

Closed
ndeodhar opened this issue Nov 9, 2019 · 0 comments
Closed
Assignees
Labels
area/cdc Change Data Capture kind/bug This issue is a bug
Milestone

Comments

@ndeodhar
Copy link
Contributor

ndeodhar commented Nov 9, 2019

In case leader of producer tablet is changed, checkpoint in CDC state table is being incorrectly reset to 0. This happens because producer looks up the commit checkpoint in cache and defaults to 0.0 when it can't find the checkpoint.

In this situation, producer should not overwrite cdc_state table with incorrect checkpoint.

@ndeodhar ndeodhar added the area/cdc Change Data Capture label Nov 9, 2019
@ndeodhar ndeodhar added this to the v2.1 milestone Nov 9, 2019
@ndeodhar ndeodhar self-assigned this Nov 9, 2019
@ndeodhar ndeodhar added the kind/bug This issue is a bug label Nov 9, 2019
ndeodhar added a commit that referenced this issue Nov 12, 2019
Summary:
In the situation where producer does not have checkpoint for a tablet in it's local cache, it resets the tablet checkpoint to 0.0 in cdc_state table incorrectly. This can happen when there is a leader change and producer tablet server does not have information about the tablet in its cache.
Note that this bug is a rare situation which happens when both producer and consumer tablet leadership would have changed. In this case, consumer does not send a commit checkpoint, and producer ends up overwriting the checkpoint in cdc_state table.

Fix is:
If we cannot find tablet in cache or if data in cache is stale (can happen during frequent leader changes), then make sure that we read latest data from cdc_state table and don't incorrectly update the table to 0 or to a stale value.

Test Plan:
Jenkins
Added unit test.

Reviewers: rahuldesirazu, hector, nicolas

Reviewed By: nicolas

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7541
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cdc Change Data Capture kind/bug This issue is a bug
Projects
None yet
Development

No branches or pull requests

1 participant