Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
…erations Summary: **Item 1:** An `UPDATE` operation is a `read-modify-write` operation as DocDB finds the row first and then performs the update. In a `SERIALIZABLE ISOLATION` transaction, DocDB takes a read lock for the row that is being updated to prevent this row from being removed by a concurrent transaction. This is done by creating a `strong read` intent for the row. But this kind of an intent conflicts with updates to that row's columns by another transaction. ``` CREATE TABLE t(k INT PRIMARY KEY, v1 INT, v2 INT); INSERT INTO t VALUES(1, 2, 3); --Connection 1: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v1 = 20 WHERE k = 1; -- created intents: strong read for (1), strong write for (1, v1), weak write for (1) --Connection 2: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v2 = 30 WHERE k = 1; -- created intents: strong read for (1), strong write for (1, v2), weak write for (1) -- Where strong read for (1) conflicts with weak write for (1) in Connection 1 ``` To avoid conflict in the described scenario, we must use a `weak read` intent for row locking (to prevent row from being removed) instead of a `strong read`. The `DocOperation::GetDocPaths` function called with a `GetDocPathsMode::kLock` argument determines what intents to use for row locking. `Strong read` intents will be created for all the paths returned by this method call. Also `weak read` intents will be created for all the prefixes of returned paths. As a result, to make `weak read` intents for a row, `GetDocPaths` method may return the path for an arbitrary column of that row instead of the row itself. (And a strong read intent will be created on that column.) The path of the liveness column can be used for this purpose. **Note:** In case `UPDATE` command uses an expression instead of an explicit value for some of the columns the `GetDocPaths` method will create `strong read` intent for the whole row because such the expressions may read the column values. And determining of exact set of read columns may be too expensive: ``` CREATE TABLE t(k INT PRIMARY KEY, v INT); INSERT INTO t values(1, 1); -- Connection 1 START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v = v + 1 WHERE k = 1; -- Connection 2 START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v = v * 3 WHERE k = 1; ``` **Item 2:** Pggate sends a write operation that only includes the modified columns in case of a `single row UPDATE`. In case of a `non-single row UPDATE` pggate sends a write operation with all columns (including unchanged). This causes DocDB to take redundant locks on unchanged columns. The reason why pggate works this way is that `BEFORE UPDATE` triggers may change additional columns in the row. A single-row UPDATE guarantees that the current table has no triggers at all, so no extra columns are changed. To avoid sending unchanged columns to DocDB in case of `non-single row UPDATE` (actually when row has `BEFORE UPDATE` trigger) the comparison of columns values in `old` and `new` tuple is performed. Extra columns are added into write operation only in case their values in `old` and `new` tuple don't match. **Note**: This part of diff fixes the problem with key column changes by the `BEFORE UPDATE` trigger. Unit test for this case are added. **Item 3:** To prevent row from being deleted by another transaction, a foreign key check sends a read command with a `ROW_MARK_KEYSHARE` row mark. But `ROW_MARK_KEYSHARE` creates a `strong read` intent (same as `ROW_MARK_SHARE`). As a result, updating of columns in a referenced table by another transaction conflicts with the current transaction. To avoid this conflict, `ROW_MARK_KEYSHARE` has to create a `weak read` intent instead of a `strong read`. But without extra changes the scenario with `FOR KEY SHARE` and an incomplete row doc key will be broken: ``` CREATE TABLE t(h INT, r1 INT, r2 INT, v INT, PRIMARY KEY(h, r1 ASC, r2 ASC)); INSERT INTO t VALUES(1, 2, 3, 4); -- Connection 1: BEGIN; SELECT * FROM t WHERE h = 1 FOR KEY SHARE; -- Another transaction should not be able to remove any of returned rows -- Connection 2: DELETE FROM t WHERE h = 1 AND r1 = 2 AND r2 = 3; -- Creates strong write intent for (1, 2, 3) and weak write intents for (1, 2) and (1), but Connection 1 created weak read intent for (1), so no conflict here and row will be deleted ``` To avoid such behavior, the internal client should use `ROW_MARK_SHARE` instead of `ROW_MARK_KEYSHARE` in scenarios where not all key components are specified in read operation. **item 4** To fix the #2922 issue completely the following mapping between row marks and intents is implemented ``` switch (row_mark) { case RowMarkType::ROW_MARK_EXCLUSIVE: // FOR UPDATE: strong read + strong write lock on the DocKey, // as if we're replacing or deleting the entire row in DocDB. return IntentTypeSet({IntentType::kStrongRead, IntentType::kStrongWrite}); case RowMarkType::ROW_MARK_NOKEYEXCLUSIVE: // FOR NO KEY UPDATE: strong read + weak write lock on the DocKey, as if we're reading // the entire row and then writing only a subset of columns in DocDB. return IntentTypeSet({IntentType::kStrongRead, IntentType::kWeakWrite}); case RowMarkType::ROW_MARK_SHARE: // FOR SHARE: strong read on the DocKey, as if we're reading the entire row in DocDB. return IntentTypeSet({IntentType::kStrongRead}); case RowMarkType::ROW_MARK_KEYSHARE: // FOR KEY SHARE: weak read lock on the DocKey, preventing the entire row from being // replaced / deleted, as if we're simply reading some of the column. // This is the type of locking that is used by foreign keys, so this will // prevent the referenced row from disappearing. The reason it does not // conflict with the FOR NO KEY UPDATE above is conceptually the following: // an operation that reads the entire row and then writes a subset of columns // (FOR NO KEY UPDATE) does not have to conflict with an operation that could // be reading a different subset of columns (FOR KEY SHARE). return IntentTypeSet({IntentType::kWeakRead}); default: break; } ``` Test Plan: New test cases were introduced ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressTrigger' ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.ReferencedTableUpdate* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SameColumnUpdate* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowKeyShareLock* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowLockConflictMatrix* ``` Reviewers: mihnea, alex, hbhanawat, mbautin, sergei Reviewed By: mbautin, sergei Subscribers: kannan, yql Differential Revision: https://phabricator.dev.yugabyte.com/D11239
- Loading branch information