Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tables from Pg/Mysql sources have duplicate data after a backup+rollback #29548

Closed
def- opened this issue Sep 15, 2024 · 3 comments · Fixed by #29571
Closed

Tables from Pg/Mysql sources have duplicate data after a backup+rollback #29548

def- opened this issue Sep 15, 2024 · 3 comments · Fixed by #29571
Assignees
Labels
C-bug Category: something is broken

Comments

@def-
Copy link
Contributor

def- commented Sep 15, 2024

What version of Materialize are you using?

v0.117.0

What is the issue?

Seen in Checks + backup + rollback to previous 1 and Checks + backup + rollback to previous 2:

8:1: error: non-matching rows: expected:
[["1", "1234", "<null>"], ["2", "0", "x1"], ["3", "2345", "x2"], ["4", "3456", "x2"]]
got:
[["1", "1234", "<null>"], ["2", "0", "x1"], ["3", "2345", "x2"], ["4", "3456", "x2"], ["4", "3456", "x2"]]
Poor diff:
+ 4 3456 x2

     |
   7 | 
   8 | > SELECT * FROM pg_table_1b;
     | ^
+++ !!! Error Report
1 errors were encountered during execution
source: /var/lib/buildkite-agent/builds/hetzner-aarch64-16cpu-32gb-c9578c7e/materialize/release-qualification/misc/python/materialize/checks/all_checks/source_tables.py:102

I wouldn't consider this a release blocker since it's a new feature and not enabled by default (or by any customers).

Reproduces locally with bin/mzcompose --find platform-checks run default --scenario=BackupAndRestoreToPreviousState --check=TableFromPgSource --check=TableFromMySqlSource

@def- def- added C-bug Category: something is broken T-correctness Theme: relates to consistency and correctness of results. labels Sep 15, 2024
@def- def- changed the title Tables from Pg/Mysql sources have duplicate data after a backup/restore cycle Tables from Pg/Mysql sources have duplicate data after a backup+rollback Sep 15, 2024
@rjobanp
Copy link
Contributor

rjobanp commented Sep 16, 2024

Worth noting that my reproduction locally happened when querying the old syntax subsource:

> SELECT * FROM pg_table_1;
rows match; continuing at ts 1726500162.492181
> SELECT * FROM pg_table_1b;
rows match; continuing at ts 1726500162.504979
> SELECT * FROM pg_table_2;
rows match; continuing at ts 1726500162.51927
> SELECT * FROM pg_table_1_old_syntax;
rows didn't match; sleeping to see if dataflow catches up 50ms 75ms 113ms 169ms 253ms 380ms 570ms 854ms 1s 2s 3s 4s 6s 10s 15s 22s 33s 49s 74s 78s
^^^ +++
19:1: error: non-matching rows: expected:
[["1", "1234", "<null>"], ["2", "0", "x1"], ["3", "2345", "x2"], ["4", "3456", "x2"]]
got:
[["1", "1234", "<null>"], ["2", "0", "x1"], ["3", "2345", "x2"], ["4", "3456", "x2"], ["4", "3456", "x2"]]
Poor diff:
+ 4 3456 x2

     |
  18 | 
  19 | > SELECT * FROM pg_table_1_old_syntax;
     | ^
+++ !!! Error Report
1 errors were encountered during execution

still investigating...

@rjobanp
Copy link
Contributor

rjobanp commented Sep 16, 2024

@def- @nrainer-materialize I'm wondering if this is actually something that should be passing at all? This scenario says that the steps in manipulate need to be idempotent in the external system:

class BackupAndRestoreToPreviousState(Scenario):
"""Backup, run more workloads, and then Restore to a previous state."""
def requires_external_idempotence(self) -> bool:
# This scenario will run manipulate(#2) twice, so only compatible
# Checks are allowed to participate
return True
def actions(self) -> list[Action]:
return [
StartMz(self),
Initialize(self),
Manipulate(self, phase=1),
Backup(),
Manipulate(self, phase=2), # Those updates will be lost here ..
KillMz(),
Restore(),
Manipulate(self, phase=2), # ... and redone here
Validate(self),
]

but doing an INSERT INTO pg_table_1 VALUES (4, 3456, 'x2'); is not an idempotent operation in postgres. How should this be expected to work?

@def-
Copy link
Contributor Author

def- commented Sep 16, 2024

Let me check! This might be a test issue after all.

@def- def- assigned def- and unassigned rjobanp Sep 16, 2024
@def- def- removed the T-correctness Theme: relates to consistency and correctness of results. label Sep 16, 2024
@def- def- closed this as completed in 792a46e Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: something is broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants