Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jepsen][YSQL] Two concurrent complex update-insert transactions may both be successful, while only one will be presented #11165

Closed
qvad opened this issue Jan 21, 2022 · 4 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/high High Priority

Comments

@qvad
Copy link
Contributor

qvad commented Jan 21, 2022

Jira Link: DB-8240

Description

Issue appeared after Jepsen version was updated to 0.2.5, not sure how this change was related so this issue appears, nevertheless we observe problem in transaction history on server side.

All current versions are affected, it may be an old issue. Issue disappears if we relax consistency model check from :strict-serialisable to :serialisable. Link to Jepsen consistency models decription

Issue is well reproducible, but provides a lot of noise in the results. Automated description may contain 100 transactions. Trying to fix this in future jepsen tests.

2 transactions in history, both - first insert into a table key 37

2021-12-19 22:52:57,377{GMT}	INFO	[jepsen worker 19] jepsen.util: 499	:ok	:txn	[[:append 37 7]]
...
2021-12-19 22:52:57,404{GMT}	INFO	[jepsen worker 18] jepsen.util: 618	:ok	:txn	[[:append 37 1]]

Code that acts here, both update and insert are evaluated.

(defn append-primary!
  "Writes a key based on primary key."
  [conn table row col v]
  (let [r (c/execute! conn [(str "update " table
                                 " set " col " = CONCAT(" col ", ',', ?) "
                                 "where k = ?") v row])]
    (when (= [0] r)
      ; No rows updated
      (c/execute! conn
                  [(str "insert into " table
                        " (k, k2, " col ") values (?, ?, ?)") row row v]))
    v))

Both transactions are committed, while we observe only one value after - note that we have read [:r 37 [1 10 11 ...

 T17 = {:type :ok, :f :txn, :value [[:r 37 [1 10 11 23 34 27 46 53 40 41 42 61 70 71 44 66 82 59 100 106 113 107 108 51 120 60 125 126 129 121 127 128 139 75 143 144 147 152 153 154 162 165 173 174 171 172 180 183 182 194 202 170 208 209 215 220 221 222 228 229 230 219 260 244 288 235 290 294 298 299 301 291 305 284 307 308 321 326 330 333 335 336 331 332 340 319 320 342 341 270 350 352 355 327 367 364 369 370 376 379 382 373 388 390 391 394]]], :time 130304584572, :process 515, :op-timing ["2021-12-19_22:53:08.087" "2021-12-19_22:53:08.110"], :index 14200}

20211219T225033.000Z.zip

@qvad qvad added the area/ysql Yugabyte SQL (YSQL) label Jan 21, 2022
@qvad
Copy link
Contributor Author

qvad commented Feb 13, 2022

Reviewed other run with trading all queries option enabled

Here we also have key = 37 and 2 values 8 and 11
pg log 0

2022-02-13 13:36:39.500 UTC [19544] LOG:  execute S_1: COMMIT
2022-02-13 13:36:39.504 UTC [19544] LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.504 UTC [19544] LOG:  execute S_3: BEGIN
2022-02-13 13:36:39.504 UTC [19544] LOG:  execute <unnamed>: update append4 set v1 = CONCAT(v1, ',', $1) where k = $2
2022-02-13 13:36:39.504 UTC [19544] DETAIL:  parameters: $1 = '11', $2 = '18'
2022-02-13 13:36:39.510 UTC [19544] LOG:  execute S_1: COMMIT
2022-02-13 13:36:39.515 UTC [19544] LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.517 UTC [19544] LOG:  execute S_3: BEGIN
2022-02-13 13:36:39.517 UTC [19544] LOG:  execute <unnamed>: insert into append4 (k, k2, v1) values ($1, $2, $3)
2022-02-13 13:36:39.517 UTC [19544] DETAIL:  parameters: $1 = '18', $2 = '18', $3 = '11'
2022-02-13 13:36:39.524 UTC [19630] LOG:  execute <unnamed>: select (v1) from append1 where k = $1
2022-02-13 13:36:39.524 UTC [19630] DETAIL:  parameters: $1 = '13'
2022-02-13 13:36:39.529 UTC [19544] LOG:  execute S_1: COMMIT
2022-02-13 13:36:39.530 UTC [19630] LOG:  execute S_1: COMMIT

pg log 1

2022-02-13 13:36:39.451 UTC [21640] LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.451 UTC [21640] LOG:  execute <unnamed>: BEGIN
2022-02-13 13:36:39.452 UTC [21640] LOG:  execute <unnamed>: update append4 set v1 = CONCAT(v1, ',', $1) where k = $2
2022-02-13 13:36:39.452 UTC [21640] DETAIL:  parameters: $1 = '8', $2 = '18'
2022-02-13 13:36:39.463 UTC [21640] LOG:  execute S_1: COMMIT
2022-02-13 13:36:39.466 UTC [21640] LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.467 UTC [21640] LOG:  execute <unnamed>: BEGIN
2022-02-13 13:36:39.518 UTC [21640] LOG:  execute <unnamed>: insert into append4 (k, k2, v1) values ($1, $2, $3)
2022-02-13 13:36:39.518 UTC [21640] DETAIL:  parameters: $1 = '18', $2 = '18', $3 = '8'
2022-02-13 13:36:39.530 UTC [21640] LOG:  execute S_1: COMMIT
2022-02-13 13:36:39.541 UTC [21640] LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.542 UTC [21640] LOG:  execute <unnamed>: BEGIN

For tserver that has faulty transaction new segment allocation messages exists.

I0213 13:36:39.121099 17158 tablet_service.cc:334] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154 [state=RUNNING]: Write failed: Operation failed. Try again (yb/docdb/conflict_resolution.cc:89): 08e88f47-7d03-4083-9a3b-90378bb39d0f Conflicts with higher priority transaction: 07ef9776-721b-4375-b5dc-52aaffd3d5bf (transaction error 3)
I0213 13:36:39.126605 17353 tablet_service.cc:334] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154 [state=RUNNING]: Write failed: Operation failed. Try again (yb/docdb/conflict_resolution.cc:89): dc533958-ef14-4681-a9bc-b0d6c56148de Conflicts with higher priority transaction: 07ef9776-721b-4375-b5dc-52aaffd3d5bf (transaction error 3)
I0213 13:36:39.199928 17389 tablet_service.cc:334] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154 [state=RUNNING]: Write failed: Operation failed. Try again (yb/tablet/running_transaction.cc:465): Transaction aborted: ed4e3ebe-cdea-4bc2-a42d-11dcd65caae2 (pgsql error 40001)
I0213 13:36:39.211046 17355 transaction_coordinator.cc:1220] T 1ee9551fcff440bba7dfc4d4247cfa53 P a3a83685e62249cb9285000f1d656154: Request to unknown transaction 7dffceba-3540-4dbe-a07e-5447cb31867f: transaction_id: "}\377\316\2725@M\276\240~TG\3131\206\177" status: COMMITTED tablets: "817f09ba27a9491986d10f7d878b175b" tablets: "e3ccd9c4fcba4b3aab99d42ef5887165" tablets: "1493614872074231b58fd16b319688c6"
I0213 13:36:39.229215 17354 transaction_coordinator.cc:1220] T b28a4067961d4aa0975d78d4873a7a80 P a3a83685e62249cb9285000f1d656154: Request to unknown transaction d7029964-6d76-45a4-9386-a471ea286975: transaction_id: "\327\002\231dmvE\244\223\206\244q\352(iu" status: COMMITTED tablets: "0e127273f4454b259720239eab55bcce" tablets: "1493614872074231b58fd16b319688c6" tablets: "d5ac976fa0a7470eb48c820b411b6ff0"
I0213 13:36:39.272332 18926 log.cc:808] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154: Max segment size 1048576 reached. Starting new segment allocation.
I0213 13:36:39.274272 18926 log.cc:683] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154: Last appended OpId in segment /home/yugabyte/data/yb-data/tserver/wals/table-000033df00003000800000000000400a/tablet-d5ac976fa0a7470eb48c820b411b6ff0/wal-000000002: 2.3132
I0213 13:36:39.279155 18926 log.cc:692] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154: Rolled over to a new segment: /home/yugabyte/data/yb-data/tserver/wals/table-000033df00003000800000000000400a/tablet-d5ac976fa0a7470eb48c820b411b6ff0/wal-000000003
I0213 13:36:39.882437 18885 transaction_coordinator.cc:1220] T b2f4a78b3a034b3d890693c7e7590726 P a3a83685e62249cb9285000f1d656154: Request to unknown transaction aa1786be-9e3e-4e93-bba5-22d497cc653f: transaction_id: "\252\027\206\276\236>N\223\273\245\"\324\227\314e?" status: COMMITTED tablets: "98ad2f9e07ef4f7ca8ccc9fdad66bbb1" tablets: "e818b51eb9b3414b91ed5235a11ee335"
I0213 13:36:39.907943 17158 transaction_coordinator.cc:1220] T cad1fa6619f54e3fbe41d760d7278f33 P a3a83685e62249cb9285000f1d656154: Request to unknown transaction 3140b9ac-ba49-4de7-b969-47655ecb3cf0: transaction_id: "1@\271\254\272IM\347\271iGe^\313<\360" status: COMMITTED tablets: "e3ccd9c4fcba4b3aab99d42ef5887165" tablets: "1493614872074231b58fd16b319688c6"
I0213 13:36:39.938064 17318 write_operation.cc:93] Apply operation failed: Operation failed. Try again (yb/tablet/tablet.cc:1169): Transaction was recently aborted: 04c07832-6842-4051-b158-d3794382bc69: 40001 (pgsql error 40001) (transaction error 1)

@rthallamko3
Copy link
Contributor

@es1024 , Can you help take a look?

@qvad
Copy link
Contributor Author

qvad commented Mar 28, 2022

Got another reproducer I think that can be evaluated against local cluster.
So scenario is classic bank workload, but with INSERTs and DELETEs.

UPDATEs.

firstOpResult = statement.executeUpdate(String.format("UPDATE %s SET balance = balance - %s WHERE id = %s;",
                            getTableName(), balance, accFrom));
secondOpResult = statement.executeUpdate(String.format("UPDATE %s SET balance = balance + %s WHERE id = %s;",
                            getTableName(), balance, accTo));

INSERTs:

accTo = endKey.get();

firstOpResult = statement.executeUpdate(String.format("UPDATE %s SET balance = balance - %s WHERE id = %s;",
           getTableName(), balance, accFrom));
secondOpResult = statement.executeUpdate(String.format("INSERT INTO %s (id, balance) VALUES (%s, %s);",
           getTableName(), accTo, balance));

counterToIncrement = endKey;

DELETEs

accFrom = startKey.get();

while (accTo == accFrom) {
    accTo = ThreadLocalRandom.current().nextLong(startKey.get(), endKey.get());
}

statement.execute(String.format("SELECT balance FROM %s WHERE id = %s;", getTableName(), accFrom));
try (ResultSet rs = statement.getResultSet()) {
    rs.next();
    balance = rs.getInt(1);
}

firstOpResult = statement.executeUpdate(String.format("UPDATE %s SET balance = balance + %s WHERE id = %s;",
        getTableName(), balance, accTo));
secondOpResult = statement.executeUpdate(String.format("DELETE FROM %s WHERE id = %s;",
        getTableName(), accFrom));

counterToIncrement = startKey;

So during this test there is a possibility that on INSERT operation few different threads will try to commit the same key.

Try to start this jar file with following parameters and then check history table: select * from bankhistory order by id;. There should be 2 or more rows in the end with op == 2 (INSERT) and accto should be the same. Which is exactly the case we observe here - 2 successful insert transactions into the same key. Ideally it might happen in seconds.

Arguments to launch jar:
yb-sample-apps.jar.zip

yb-sample-apps.jar com.yugabyte.sample.Main --workload SqlBankTransfer --nodes 192.168.64.6:5433 --num_threads_write 10 --num_threads_read 10 --bank_enable_logging true --bank_exit_on_fail true
serial op accFrom accTo balance
1201 2 2 22 3
1301 2 8 22 1

@es1024
Copy link
Contributor

es1024 commented Mar 29, 2022

Simple case to reproduce this bug:
Schema:

CREATE TABLE test(k INTEGER PRIMARY KEY, v VARCHAR);

Using two connections to the database:

ysqlsh #1> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
ysqlsh #1> INSERT INTO test VALUES(1, 't1');
ysqlsh #2> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
ysqlsh #2> INSERT INTO test VALUES(1, 't2');
ysqlsh #1> COMMIT;
ysqlsh #2> COMMIT;

Both transactions will commit, even though they're writing to the same row. This is likely because we're doing primary key checks only when processing INSERT and not at commit time, and not acquiring the right locks to cause the conflict to be detected. This is a regression introduced by 83a150b and only affects SERIALIZABLE isolation.

@rthallamko3 rthallamko3 assigned d-uspenskiy and unassigned es1024 Apr 1, 2022
@rthallamko3 rthallamko3 added the priority/high High Priority label Apr 1, 2022
d-uspenskiy added a commit that referenced this issue Apr 2, 2022
…txns

Summary:
The D11239 / 83a150b introduces the optimization for UPDATE statement based on relaxing lock intent strength. After the change UPDATE of particular row takes `weak read` intent instead of `strong read` intent. This is achieved by creating `strong read` intent on row's `kLivenessColumn` column instead of row key itself.
But this change introduces the bug that INSERT/UPSERT tries to create `strong read` intent on `kLivenessColumn` as well.
Solution is to limit intent relaxing optimization for UPDATE operation only.

Test Plan:
Jenkins

New unit tests were introduced

```
./yb_build.sh --gtest_filter PgMiniTest.DuplicateInsert*
./yb_build.sh --gtest_filter PgMiniTest.DuplicateUniqueIndexInsert*
./yb_build.sh --gtest_filter PgMiniTest.DuplicateNonUniqueIndexInsert*
```

Reviewers: mbautin, sergei, pjain, mihnea

Reviewed By: sergei, mihnea

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D16278
@tverona1 tverona1 added the status/awaiting-triage Issue awaiting triage label Apr 6, 2022
@d-uspenskiy d-uspenskiy moved this to Done in YQL-beta Apr 7, 2022
d-uspenskiy added a commit that referenced this issue Apr 7, 2022
…m 2 SERIALIZABLE txns

Summary:
The D11239 / 83a150b introduces the optimization for UPDATE statement based on relaxing lock intent strength. After the change UPDATE of particular row takes `weak read` intent instead of `strong read` intent. This is achieved by creating `strong read` intent on row's `kLivenessColumn` column instead of row key itself.
But this change introduces the bug that INSERT/UPSERT tries to create `strong read` intent on `kLivenessColumn` as well.
Solution is to limit intent relaxing optimization for UPDATE operation only.

Original commit: 7333ee7 / D16278

Test Plan:
Jenkins

New unit tests were introduced

```
./yb_build.sh --gtest_filter PgMiniTest.DuplicateInsert*
./yb_build.sh --gtest_filter PgMiniTest.DuplicateUniqueIndexInsert*
./yb_build.sh --gtest_filter PgMiniTest.DuplicateNonUniqueIndexInsert*
```

Reviewers: mbautin, sergei, pjain, mihnea

Reviewed By: mihnea

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D16425
d-uspenskiy added a commit that referenced this issue Apr 8, 2022
… SERIALIZABLE txns

Summary:
The D11239 / 83a150b introduces the optimization for UPDATE statement based on relaxing lock intent strength. After the change UPDATE of particular row takes `weak read` intent instead of `strong read` intent. This is achieved by creating `strong read` intent on row's `kLivenessColumn` column instead of row key itself.
But this change introduces the bug that INSERT/UPSERT tries to create `strong read` intent on `kLivenessColumn` as well.
Solution is to limit intent relaxing optimization for UPDATE operation only.

Original commit: 7333ee7 / D16278

Test Plan:
Jenkins

New unit tests were introduced

```
./yb_build.sh --gtest_filter PgMiniTest.DuplicateInsert*
./yb_build.sh --gtest_filter PgMiniTest.DuplicateUniqueIndexInsert*
./yb_build.sh --gtest_filter PgMiniTest.DuplicateNonUniqueIndexInsert*
```

Reviewers: mbautin, sergei, pjain, mihnea

Reviewed By: mihnea

Differential Revision: https://phabricator.dev.yugabyte.com/D16433
d-uspenskiy added a commit that referenced this issue Apr 8, 2022
…2 SERIALIZABLE txns

Summary:
The D11239 / 83a150b introduces the optimization for UPDATE statement based on relaxing lock intent strength. After the change UPDATE of particular row takes `weak read` intent instead of `strong read` intent. This is achieved by creating `strong read` intent on row's `kLivenessColumn` column instead of row key itself.
But this change introduces the bug that INSERT/UPSERT tries to create `strong read` intent on `kLivenessColumn` as well.
Solution is to limit intent relaxing optimization for UPDATE operation only.

Original commit: 7333ee7 / D16278

Test Plan:
Jenkins

New unit tests were introduced

```
./yb_build.sh --gtest_filter PgMiniTest.DuplicateInsert*
./yb_build.sh --gtest_filter PgMiniTest.DuplicateUniqueIndexInsert*
./yb_build.sh --gtest_filter PgMiniTest.DuplicateNonUniqueIndexInsert*
```

Reviewers: mbautin, sergei, pjain, mihnea

Reviewed By: mihnea

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D16434
@rthallamko3 rthallamko3 removed the status/awaiting-triage Issue awaiting triage label Oct 9, 2023
@yugabyte-ci yugabyte-ci added the kind/enhancement This is an enhancement of an existing feature label Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/high High Priority
Projects
Status: Done
Development

No branches or pull requests

7 participants