Skip to content

Commit

Permalink
[#5036, #2922] YSQL: Avoid redundant key locking in case of update op…
Browse files Browse the repository at this point in the history
…erations

Summary:
**Item 1:**
An `UPDATE` operation is a `read-modify-write` operation as DocDB finds the row first and then performs the update.
In a `SERIALIZABLE ISOLATION` transaction, DocDB takes a read lock for the row that is being updated to prevent this row from being removed by a concurrent transaction. This is done by creating a `strong read` intent for the row.
But this kind of an intent conflicts with updates to that row's columns by another transaction.
```
CREATE TABLE t(k INT PRIMARY KEY, v1 INT, v2 INT);
INSERT INTO t VALUES(1, 2, 3);

--Connection 1:
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE t SET v1 = 20 WHERE k = 1; -- created intents: strong read for (1), strong write for (1, v1), weak write for (1)

--Connection 2:
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE t SET v2 = 30 WHERE k = 1; -- created intents: strong read for (1), strong write for (1, v2), weak write for (1)
                                  -- Where strong read for (1) conflicts with weak write for (1) in Connection 1
```

To avoid conflict in the described scenario, we must use a `weak read` intent for row locking (to prevent row from being removed) instead of a `strong read`.

The `DocOperation::GetDocPaths` function called with a `GetDocPathsMode::kLock` argument determines what intents to use for row locking.
`Strong read` intents will be created for all the paths returned by this method call. Also `weak read` intents will be created for all the prefixes of returned paths.

As a result, to make `weak read` intents for a row, `GetDocPaths` method may return the path for an arbitrary column of that row instead of the row itself. (And a strong read intent will be created on that column.)
The path of the liveness column can be used for this purpose.

**Note:** In case `UPDATE` command uses an expression instead of an explicit value for some of the columns the `GetDocPaths` method will create `strong read` intent for the whole row because such the expressions may read the column values. And determining of exact set of read columns may be too expensive:

```
CREATE TABLE t(k INT PRIMARY KEY, v INT);
INSERT INTO t values(1, 1);

-- Connection 1
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE t SET v = v + 1 WHERE k = 1;

-- Connection 2
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE t SET v = v * 3 WHERE k = 1;
```

**Item 2:**
Pggate sends a write operation that only includes the modified columns in case of a `single row UPDATE`. In case of a `non-single row UPDATE` pggate sends a write operation with all columns (including unchanged).
This causes DocDB to take redundant locks on unchanged columns. The reason why pggate works this way is that `BEFORE UPDATE` triggers may change additional columns in the row. A single-row UPDATE guarantees that the current table has no triggers at all, so no extra columns are changed.
To avoid sending unchanged columns to DocDB in case of `non-single row UPDATE` (actually when row has `BEFORE UPDATE` trigger) the comparison of columns values in `old` and `new` tuple is performed. Extra columns are added into write operation only in case their values in `old` and `new` tuple don't match.

**Note**: This part of diff fixes the problem with key column changes by the `BEFORE UPDATE` trigger. Unit test for this case are added.

**Item 3:**
To prevent row from being deleted by another transaction, a foreign key check sends a read command with a `ROW_MARK_KEYSHARE` row mark.
But `ROW_MARK_KEYSHARE` creates a `strong read` intent (same as `ROW_MARK_SHARE`). As a result, updating of columns in a referenced table by another transaction conflicts with the current transaction.

To avoid this conflict, `ROW_MARK_KEYSHARE` has to create a `weak read` intent instead of a `strong read`.
But without extra changes the scenario with `FOR KEY SHARE` and an incomplete row doc key will be broken:

```
CREATE TABLE t(h INT, r1 INT, r2 INT, v INT, PRIMARY KEY(h, r1 ASC, r2 ASC));
INSERT INTO t VALUES(1, 2, 3, 4);

-- Connection 1:
BEGIN;
SELECT * FROM t WHERE h = 1 FOR KEY SHARE; -- Another transaction should not be able to remove any of returned rows

-- Connection 2:
DELETE FROM t WHERE h = 1 AND r1 = 2 AND r2 = 3; -- Creates strong write intent for (1, 2, 3) and weak write intents for (1, 2) and (1), but Connection 1 created weak read intent for (1), so no conflict here and row will be deleted
```

To avoid such behavior, the internal client should use `ROW_MARK_SHARE` instead of `ROW_MARK_KEYSHARE` in scenarios where not all key components are specified in read operation.

**item 4**
To fix the #2922 issue completely the following mapping between row marks and intents is implemented

```
    switch (row_mark) {
      case RowMarkType::ROW_MARK_EXCLUSIVE:
        // FOR UPDATE: strong read + strong write lock on the DocKey,
        //             as if we're replacing or deleting the entire row in DocDB.
        return IntentTypeSet({IntentType::kStrongRead, IntentType::kStrongWrite});
      case RowMarkType::ROW_MARK_NOKEYEXCLUSIVE:
        // FOR NO KEY UPDATE: strong read + weak write lock on the DocKey, as if we're reading
        //                    the entire row and then writing only a subset of columns in DocDB.
        return IntentTypeSet({IntentType::kStrongRead, IntentType::kWeakWrite});
      case RowMarkType::ROW_MARK_SHARE:
        // FOR SHARE: strong read on the DocKey, as if we're reading the entire row in DocDB.
        return IntentTypeSet({IntentType::kStrongRead});
      case RowMarkType::ROW_MARK_KEYSHARE:
        // FOR KEY SHARE: weak read lock on the DocKey, preventing the entire row from being
        //               replaced / deleted, as if we're simply reading some of the column.
        //               This is the type of locking that is used by foreign keys, so this will
        //               prevent the referenced row from disappearing. The reason it does not
        //               conflict with the FOR NO KEY UPDATE above is conceptually the following:
        //               an operation that reads the entire row and then writes a subset of columns
        //               (FOR NO KEY UPDATE) does not have to conflict with an operation that could
        //               be reading a different subset of columns (FOR KEY SHARE).
        return IntentTypeSet({IntentType::kWeakRead});
      default:
        break;
    }
```

Test Plan:
New test cases were introduced

```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressTrigger'
./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.ReferencedTableUpdate*
./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SameColumnUpdate*
./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowKeyShareLock*
./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowLockConflictMatrix*
```

Reviewers: mihnea, alex, hbhanawat, mbautin, sergei

Reviewed By: mbautin, sergei

Subscribers: kannan, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D11239
  • Loading branch information
d-uspenskiy committed Aug 9, 2021
1 parent 4e72880 commit 83a150b
Show file tree
Hide file tree
Showing 17 changed files with 744 additions and 209 deletions.
100 changes: 98 additions & 2 deletions src/postgres/src/backend/executor/nodeModifyTable.c
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/tqual.h"
#include "utils/typcache.h"

/* YB includes. */
#include "access/sysattr.h"
Expand Down Expand Up @@ -1009,6 +1010,83 @@ ldelete:;
return NULL;
}

/* ----------------------------------------------------------------
* YBEqualDatums
*
* Function compares values of lhs and rhs datums with respect to value type and collation.
*
* Returns true in case value of lhs and rhs datums match.
* ----------------------------------------------------------------
*/
static bool
YBEqualDatums(Datum lhs, Datum rhs, Oid atttypid, Oid collation)
{
TypeCacheEntry *typentry = lookup_type_cache(atttypid, TYPECACHE_CMP_PROC_FINFO);
if (!OidIsValid(typentry->cmp_proc_finfo.fn_oid))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FUNCTION),
errmsg("could not identify a comparison function for type %s",
format_type_be(typentry->type_id))));

FunctionCallInfoData locfcinfo;
InitFunctionCallInfoData(locfcinfo, &typentry->cmp_proc_finfo, 2, collation, NULL, NULL);
locfcinfo.arg[0] = lhs;
locfcinfo.arg[1] = rhs;
locfcinfo.argnull[0] = false;
locfcinfo.argnull[1] = false;
return DatumGetInt32(FunctionCallInvoke(&locfcinfo)) == 0;
}

/* ----------------------------------------------------------------
* YBBuildExtraUpdatedCols
*
* Function compares attribute value in oldtuple and newtuple for attributes which are not in the
* updatedCols set. Returns set of changed attributes or NULL.
* ----------------------------------------------------------------
*/
static Bitmapset*
YBBuildExtraUpdatedCols(Relation rel,
HeapTuple oldtuple,
HeapTuple newtuple,
Bitmapset *updatedCols)
{
if (bms_is_member(InvalidAttrNumber, updatedCols))
/* No extra work required in case the whore row is changed */
return NULL;

Bitmapset *result = NULL;
AttrNumber firstLowInvalidAttributeNumber = YBGetFirstLowInvalidAttributeNumber(rel);
TupleDesc tupleDesc = RelationGetDescr(rel);
for (int idx = 0; idx < tupleDesc->natts; ++idx)
{
FormData_pg_attribute *att_desc = TupleDescAttr(tupleDesc, idx);

AttrNumber attnum = att_desc->attnum;

/* Skip virtual (system) and dropped columns */
if (!IsRealYBColumn(rel, attnum))
continue;

int bms_idx = attnum - firstLowInvalidAttributeNumber;
if (bms_is_member(bms_idx, updatedCols))
continue;

bool old_is_null = false;
bool new_is_null = false;
Datum old_value = heap_getattr(oldtuple, attnum, tupleDesc, &old_is_null);
Datum new_value = heap_getattr(newtuple, attnum, tupleDesc, &new_is_null);
if (old_is_null != new_is_null ||
(!new_is_null && !YBEqualDatums(old_value,
new_value,
att_desc->atttypid,
att_desc->attcollation)))
{
result = bms_add_member(result, bms_idx);
}
}
return result;
}

/* ----------------------------------------------------------------
* ExecUpdate
*
Expand Down Expand Up @@ -1067,6 +1145,8 @@ ExecUpdate(ModifyTableState *mtstate,
resultRelInfo = estate->es_result_relation_info;
resultRelationDesc = resultRelInfo->ri_RelationDesc;

bool beforeRowUpdateTriggerFired = false;

/* BEFORE ROW UPDATE Triggers */
if (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_update_before_row)
Expand All @@ -1079,6 +1159,7 @@ ExecUpdate(ModifyTableState *mtstate,

/* trigger might have changed tuple */
tuple = ExecMaterializeSlot(slot);
beforeRowUpdateTriggerFired = true;
}

/* INSTEAD OF ROW UPDATE Triggers */
Expand Down Expand Up @@ -1149,8 +1230,21 @@ ExecUpdate(ModifyTableState *mtstate,

bool row_found = false;

Bitmapset *actualUpdatedCols = rte->updatedCols;
Bitmapset *extraUpdatedCols = NULL;
if (beforeRowUpdateTriggerFired)
{
/* trigger might have changed tuple */
extraUpdatedCols = YBBuildExtraUpdatedCols(
resultRelationDesc, oldtuple, tuple, rte->updatedCols);
if (extraUpdatedCols)
{
extraUpdatedCols = bms_add_members(extraUpdatedCols, rte->updatedCols);
actualUpdatedCols = extraUpdatedCols;
}
}
bool is_pk_updated =
bms_overlap(YBGetTablePrimaryKeyBms(resultRelationDesc), rte->updatedCols);
bms_overlap(YBGetTablePrimaryKeyBms(resultRelationDesc), actualUpdatedCols);

if (is_pk_updated)
{
Expand All @@ -1159,9 +1253,11 @@ ExecUpdate(ModifyTableState *mtstate,
}
else
{
row_found = YBCExecuteUpdate(resultRelationDesc, planSlot, tuple, estate, mtstate, rte->updatedCols);
row_found = YBCExecuteUpdate(
resultRelationDesc, planSlot, tuple, estate, mtstate, actualUpdatedCols);
}

bms_free(extraUpdatedCols);
if (!row_found)
{
/*
Expand Down
8 changes: 1 addition & 7 deletions src/postgres/src/backend/executor/ybcModifyTable.c
Original file line number Diff line number Diff line change
Expand Up @@ -716,14 +716,8 @@ bool YBCExecuteUpdate(Relation rel,
if (!IsRealYBColumn(rel, attnum))
continue;

/*
* Skip unmodified columns if possible.
* Note: we only do this for the single-row case, as otherwise there
* might be triggers that modify the heap tuple to set (other) columns
* (e.g. using the SPI module functions).
*/
int bms_idx = attnum - YBGetFirstLowInvalidAttributeNumber(rel);
if (isSingleRow && !whole_row && !bms_is_member(bms_idx, updatedCols))
if (!whole_row && !bms_is_member(bms_idx, updatedCols))
continue;

/* Assign this attr's value, handle expression pushdown if needed. */
Expand Down
52 changes: 52 additions & 0 deletions src/postgres/src/test/regress/expected/yb_triggers.out
Original file line number Diff line number Diff line change
Expand Up @@ -372,3 +372,55 @@ select * from self_ref_trigger;
drop table self_ref_trigger;
drop function self_ref_trigger_ins_func();
drop function self_ref_trigger_del_func();
CREATE TABLE incremental_key(h INT, r INT, v1 INT, v2 INT, PRIMARY KEY(h, r ASC));
CREATE OR REPLACE FUNCTION increment_key() RETURNS TRIGGER
LANGUAGE PLPGSQL AS $$
BEGIN
NEW.r = NEW.r + 1;
RETURN NEW;
END;
$$;
CREATE TRIGGER increment_key_trigger BEFORE UPDATE ON incremental_key
FOR EACH ROW EXECUTE PROCEDURE increment_key();
INSERT INTO incremental_key VALUES(1, 1, 1, 1);
SELECT * FROM incremental_key;
h | r | v1 | v2
---+---+----+----
1 | 1 | 1 | 1
(1 row)

UPDATE incremental_key SET v1 = 10 WHERE h = 1;
SELECT * FROM incremental_key;
h | r | v1 | v2
---+---+----+----
1 | 2 | 10 | 1
(1 row)

DROP TABLE incremental_key;
DROP FUNCTION increment_key;
CREATE TABLE incremental_value(h INT, r INT, v1 INT, v2 INT, PRIMARY KEY(h, r ASC));
CREATE OR REPLACE FUNCTION increment_value() RETURNS TRIGGER
LANGUAGE PLPGSQL AS $$
BEGIN
NEW.v2 = NEW.v2 + 1;
RETURN NEW;
END;
$$;
CREATE TRIGGER increment_value_trigger BEFORE UPDATE ON incremental_value
FOR EACH ROW EXECUTE PROCEDURE increment_value();
INSERT INTO incremental_value VALUES(1, 1, 1, 1);
SELECT * FROM incremental_value;
h | r | v1 | v2
---+---+----+----
1 | 1 | 1 | 1
(1 row)

UPDATE incremental_value SET v1 = 10 WHERE h = 1;
SELECT * FROM incremental_value;
h | r | v1 | v2
---+---+----+----
1 | 1 | 10 | 2
(1 row)

DROP TABLE incremental_value;
DROP FUNCTION increment_value;
42 changes: 42 additions & 0 deletions src/postgres/src/test/regress/sql/yb_triggers.sql
Original file line number Diff line number Diff line change
Expand Up @@ -250,3 +250,45 @@ select * from self_ref_trigger;
drop table self_ref_trigger;
drop function self_ref_trigger_ins_func();
drop function self_ref_trigger_del_func();

CREATE TABLE incremental_key(h INT, r INT, v1 INT, v2 INT, PRIMARY KEY(h, r ASC));

CREATE OR REPLACE FUNCTION increment_key() RETURNS TRIGGER
LANGUAGE PLPGSQL AS $$
BEGIN
NEW.r = NEW.r + 1;
RETURN NEW;
END;
$$;

CREATE TRIGGER increment_key_trigger BEFORE UPDATE ON incremental_key
FOR EACH ROW EXECUTE PROCEDURE increment_key();

INSERT INTO incremental_key VALUES(1, 1, 1, 1);
SELECT * FROM incremental_key;
UPDATE incremental_key SET v1 = 10 WHERE h = 1;
SELECT * FROM incremental_key;

DROP TABLE incremental_key;
DROP FUNCTION increment_key;

CREATE TABLE incremental_value(h INT, r INT, v1 INT, v2 INT, PRIMARY KEY(h, r ASC));

CREATE OR REPLACE FUNCTION increment_value() RETURNS TRIGGER
LANGUAGE PLPGSQL AS $$
BEGIN
NEW.v2 = NEW.v2 + 1;
RETURN NEW;
END;
$$;

CREATE TRIGGER increment_value_trigger BEFORE UPDATE ON incremental_value
FOR EACH ROW EXECUTE PROCEDURE increment_value();

INSERT INTO incremental_value VALUES(1, 1, 1, 1);
SELECT * FROM incremental_value;
UPDATE incremental_value SET v1 = 10 WHERE h = 1;
SELECT * FROM incremental_value;

DROP TABLE incremental_value;
DROP FUNCTION increment_value;
42 changes: 0 additions & 42 deletions src/yb/common/row_mark.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,24 +21,6 @@

namespace yb {

bool AreConflictingRowMarkTypes(
const RowMarkType row_mark_type_a,
const RowMarkType row_mark_type_b) {
constexpr int kConflictThreshold = 4;
const unsigned int value_a = static_cast<unsigned int>(row_mark_type_a);
const unsigned int value_b = static_cast<unsigned int>(row_mark_type_b);

// TODO: remove this when issue #2922 is fixed.
if ((row_mark_type_a == RowMarkType::ROW_MARK_NOKEYEXCLUSIVE &&
row_mark_type_b == RowMarkType::ROW_MARK_KEYSHARE) ||
(row_mark_type_a == RowMarkType::ROW_MARK_KEYSHARE &&
row_mark_type_b == RowMarkType::ROW_MARK_NOKEYEXCLUSIVE)) {
return true;
}

return (value_a + value_b < kConflictThreshold);
}

RowMarkType GetStrongestRowMarkType(std::initializer_list<RowMarkType> row_mark_types) {
RowMarkType strongest_row_mark_type = RowMarkType::ROW_MARK_ABSENT;
for (RowMarkType row_mark_type : row_mark_types) {
Expand All @@ -54,10 +36,8 @@ bool IsValidRowMarkType(RowMarkType row_mark_type) {
case RowMarkType::ROW_MARK_SHARE: FALLTHROUGH_INTENDED;
case RowMarkType::ROW_MARK_KEYSHARE:
return true;
break;
default:
return false;
break;
}
}

Expand All @@ -71,26 +51,4 @@ bool RowMarkNeedsPessimisticLock(RowMarkType row_mark_type) {
row_mark_type != RowMarkType::ROW_MARK_KEYSHARE;
}

std::string RowMarkTypeToPgsqlString(const RowMarkType row_mark_type) {
switch (row_mark_type) {
case RowMarkType::ROW_MARK_EXCLUSIVE:
return "UPDATE";
break;
case RowMarkType::ROW_MARK_NOKEYEXCLUSIVE:
return "NO KEY UPDATE";
break;
case RowMarkType::ROW_MARK_SHARE:
return "SHARE";
break;
case RowMarkType::ROW_MARK_KEYSHARE:
return "KEY SHARE";
break;
default:
// We shouldn't get here because other row lock types are disabled at the postgres level.
LOG(DFATAL) << "Unsupported row lock of type " << RowMarkType_Name(row_mark_type);
return "";
break;
}
}

} // namespace yb
6 changes: 0 additions & 6 deletions src/yb/common/row_mark.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,6 @@

namespace yb {

// Determine whether two row mark types conflict.
bool AreConflictingRowMarkTypes(RowMarkType row_mark_type_a, RowMarkType row_mark_type_b);

template <typename PB>
RowMarkType GetRowMarkTypeFromPB(const PB& pb) {
if (pb.has_row_mark_type()) {
Expand All @@ -49,9 +46,6 @@ bool IsValidRowMarkType(RowMarkType row_mark_type);
*/
bool RowMarkNeedsPessimisticLock(RowMarkType row_mark_type);

// Convert a row mark type to a string to use in a PostgreSQL query.
std::string RowMarkTypeToPgsqlString(const RowMarkType row_mark_type);

} // namespace yb

#endif // YB_COMMON_ROW_MARK_H
1 change: 0 additions & 1 deletion src/yb/docdb/conflict_resolution.cc
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,6 @@ class ConflictResolver : public std::enable_shared_from_this<ConflictResolver> {
existing_value.consume_byte();
auto existing_intent = VERIFY_RESULT(
docdb::ParseIntentKey(intent_iter_.key(), existing_value));

const auto intent_mask = kIntentTypeSetMask[existing_intent.types.ToUIntPtr()];
if ((conflicting_intent_types & intent_mask) != 0) {
auto transaction_id = VERIFY_RESULT(FullyDecodeTransactionId(
Expand Down
28 changes: 21 additions & 7 deletions src/yb/docdb/intent.cc
Original file line number Diff line number Diff line change
Expand Up @@ -92,16 +92,30 @@ IntentTypeSet GetStrongIntentTypeSet(
OperationKind operation_kind,
RowMarkType row_mark) {
if (IsValidRowMarkType(row_mark)) {
// TODO: possibly adjust this when issue #2922 is fixed.
// Mapping of postgres locking levels to DocDB intent types is described in details by the
// following comment https://github.com/yugabyte/yugabyte-db/issues/1199#issuecomment-501041018
switch (row_mark) {
case RowMarkType::ROW_MARK_EXCLUSIVE: FALLTHROUGH_INTENDED;
case RowMarkType::ROW_MARK_NOKEYEXCLUSIVE:
case RowMarkType::ROW_MARK_EXCLUSIVE:
// FOR UPDATE: strong read + strong write lock on the DocKey,
// as if we're replacing or deleting the entire row in DocDB.
return IntentTypeSet({IntentType::kStrongRead, IntentType::kStrongWrite});
break;
case RowMarkType::ROW_MARK_SHARE: FALLTHROUGH_INTENDED;
case RowMarkType::ROW_MARK_KEYSHARE:
case RowMarkType::ROW_MARK_NOKEYEXCLUSIVE:
// FOR NO KEY UPDATE: strong read + weak write lock on the DocKey, as if we're reading
// the entire row and then writing only a subset of columns in DocDB.
return IntentTypeSet({IntentType::kStrongRead, IntentType::kWeakWrite});
case RowMarkType::ROW_MARK_SHARE:
// FOR SHARE: strong read on the DocKey, as if we're reading the entire row in DocDB.
return IntentTypeSet({IntentType::kStrongRead});
break;
case RowMarkType::ROW_MARK_KEYSHARE:
// FOR KEY SHARE: weak read lock on the DocKey, preventing the entire row from being
// replaced / deleted, as if we're simply reading some of the column.
// This is the type of locking that is used by foreign keys, so this will
// prevent the referenced row from disappearing. The reason it does not
// conflict with the FOR NO KEY UPDATE above is conceptually the following:
// an operation that reads the entire row and then writes a subset of columns
// (FOR NO KEY UPDATE) does not have to conflict with an operation that could
// be reading a different subset of columns (FOR KEY SHARE).
return IntentTypeSet({IntentType::kWeakRead});
default:
// We shouldn't get here because other row lock types are disabled at the postgres level.
LOG(DFATAL) << "Unsupported row lock of type " << RowMarkType_Name(row_mark);
Expand Down
Loading

0 comments on commit 83a150b

Please sign in to comment.