sql, kv: FOR UPDATE NOWAIT can starve or block under serializable #129145

michae2 · 2024-08-16T18:56:23Z

Consider a SFU query like the following, using v24.2.0-rc.1:

CREATE TABLE xy (x INT PRIMARY KEY, y INT);
INSERT INTO xy VALUES (1, 1), (2, 2);

-- lock the x=1 row in the first session
BEGIN;
SELECT * FROM xy WHERE x = 1 FOR UPDATE;

-- try to lock the y=2 row in the second session
SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;

I think in this case, ideally the second SFU query would be able to lock the second row without blocking. But under serializable isolation, this currently does not happen. With default behavior (locking below the filter) the plan tries to lock all rows in xy, thus failing the NOWAIT check because of the lock on x=1:

demo@localhost:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
                            info
------------------------------------------------------------
  distribution: local
  vectorized: true

  • filter
  │ filter: y = 2
  │
  └── • scan
        missing stats
        table: xy@xy_pkey
        spans: FULL SCAN
        locking strength: for update
        locking wait policy: nowait

  index recommendations: 1
  1. type: index creation
     SQL command: CREATE INDEX ON defaultdb.public.xy (y);
(16 rows)

Time: 5ms total (execution 4ms / network 1ms)

demo@localhost:26257/demoapp/defaultdb> SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
ERROR: could not obtain lock on row (x)=(1) in xy@xy_pkey
SQLSTATE: 55P03

With the new optimizer_use_lock_op_for_serializable behavior (locking above the filter) the plan only tries to lock matching rows, but blocks when trying to read the x=1 row due to serializable isolation:

demo@localhost:26257/demoapp/defaultdb> SET optimizer_use_lock_op_for_serializable = on;
SET

Time: 1ms total (execution 1ms / network 0ms)

demo@localhost:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
                            info
------------------------------------------------------------
  distribution: local
  vectorized: true

  • lookup join (semi)
  │ table: xy@xy_pkey
  │ equality: (x) = (x)
  │ equality cols are key
  │ locking strength: for update
  │ locking wait policy: nowait
  │
  └── • filter
      │ filter: y = 2
      │
      └── • scan
            missing stats
            table: xy@xy_pkey
            spans: FULL SCAN

  index recommendations: 1
  1. type: index creation
     SQL command: CREATE INDEX ON defaultdb.public.xy (y);
(21 rows)

Time: 3ms total (execution 3ms / network 0ms)

demo@localhost:26257/demoapp/defaultdb> SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
(blocks)

It seems like neither behavior is desirable for NOWAIT.

Jira issue: CRDB-41396

Epic CRDB-40199

The text was updated successfully, but these errors were encountered:

michae2 · 2024-08-16T18:58:53Z

Under read committed isolation this works fine, because the initial scan of xy can read underneath the lock:

demo@localhost:26257/demoapp/defaultdb> BEGIN ISOLATION LEVEL READ COMMITTED;
BEGIN

Time: 0ms total (execution 1ms / network 0ms)

demo@localhost:26257/demoapp/defaultdb  OPEN> EXPLAIN SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
                            info
------------------------------------------------------------
  distribution: local
  vectorized: true

  • lookup join (semi)
  │ table: xy@xy_pkey
  │ equality: (x) = (x)
  │ equality cols are key
  │ locking strength: for update
  │ locking wait policy: nowait
  │ locking durability: guaranteed
  │
  └── • filter
      │ filter: y = 2
      │
      └── • scan
            missing stats
            table: xy@xy_pkey
            spans: FULL SCAN

  index recommendations: 1
  1. type: index creation
     SQL command: CREATE INDEX ON defaultdb.public.xy (y);
(22 rows)

Time: 5ms total (execution 4ms / network 1ms)

demo@localhost:26257/demoapp/defaultdb  OPEN> SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
  x | y
----+----
  2 | 2
(1 row)

Time: 4ms total (execution 4ms / network 1ms)

michae2 · 2024-08-16T20:30:10Z

For SKIP LOCKED we now flag the initial unlocked scans as skipped locked, as of e74202c. But this strategy won't work for NOWAIT, because we might skip over a lock that should cause the "could not obtain lock" error.

I think the only fix is to read underneath the lock in the initial scan, like we do for read committed. That's why I've tagged this as T-KV.

michae2 · 2024-09-03T19:42:16Z

After talking with @nvanbenschoten and @mw5h we think the initial unlocked read should be flagged with NOWAIT under serializable.

(This is follow-up from cockroachdb#127718, basically a refinement of commit e74202c.) When using the new Lock operator to implement FOR UPDATE and FOR SHARE, we build unlocked scans that are then followed by a final Lock operator. If the Lock operator uses NOWAIT, however, similar to SKIP LOCKED, these unlocked scans still need to have the NOWAIT behavior to avoid blocking, even if they do not take locks themselves. Unlike SKIP LOCKED, however, we do _not_ keep the NOWAIT behavior on unlocked scans under weaker isolation levels. This is because NOWAIT false positives produce false positive locking errors, whereas SKIP LOCKED false positives are benign (and in fact a minor performance optimization in some cases). Fixes: cockroachdb#129145 Release note (bug fix): Fix a bug in which some SELECT FOR UPDATE or SELECT FOR SHARE queries using NOWAIT could still block on locked rows when using `optimizer_use_lock_op_for_serializable` under Serializable isolation. This bug was present when `optimizer_use_lock_op_for_serializable` was introduced in v23.2.0.

130103: opt: add NOWAIT to unlocked scans below NOWAIT r=DrewKimball,mgartner a=michae2 (This is follow-up from #127718, basically a refinement of commit e74202c.) When using the new Lock operator to implement FOR UPDATE and FOR SHARE, we build unlocked scans that are then followed by a final Lock operator. If the Lock operator uses NOWAIT, however, similar to SKIP LOCKED, these unlocked scans still need to have the NOWAIT behavior to avoid blocking, even if they do not take locks themselves. Unlike SKIP LOCKED, however, we do _not_ keep the NOWAIT behavior on unlocked scans under weaker isolation levels. This is because NOWAIT false positives produce false positive locking errors, whereas SKIP LOCKED false positives are benign (and in fact a minor performance optimization in some cases). For example, consider the following scenario: ```sql CREATE TABLE xy (x INT PRIMARY KEY, y INT); INSERT INTO xy VALUES (1, 1), (2, 2); -- lock the x=1 row in the first session BEGIN; SELECT * FROM xy WHERE x = 1 FOR UPDATE; -- try to lock the y=2 row in the second session BEGIN; SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT; ``` Under serializable isolation, the final SELECT FOR UPDATE NOWAIT statement uses NOWAIT behavior in its initial full-table scan. This full-table scan first encounters the lock on x=1 which is a "false positive" in the sense that it is not the intended row to lock. This causes the statement to fail immediately. This might seem surprising, but under serializable isolation our only alternative is to instead block on the x=1 lock until the first transaction finishes. We choose to fail immediately (the NOWAIT behavior) instead of blocking because (a) it matches our behavior before this change, (b) it conceptually fits with the read-lock semantics of serializable isolation, in which we consider every read to acquire a shared lock, and (c) it preserves the expected serializable behavior in case the first transaction updates the x=1 row to have y=2. Under weaker isolation levels we do not use NOWAIT in the initial full-table scan to avoid this false positive error, instead reading "underneath" the lock and only trying to lock the y=2 row. If, instead of NOWAIT, we consider the final statement to use SKIP LOCKED, we can see that it doesn't hurt anything for us to always use the SKIP LOCKED behavior in the initial full table scan, even under weaker isolation levels. And sometimes it helps. Fixes: #129145 Release note (bug fix): Fix a bug in which some SELECT FOR UPDATE or SELECT FOR SHARE queries using NOWAIT could still block on locked rows when using `optimizer_use_lock_op_for_serializable` under Serializable isolation. This bug was present when `optimizer_use_lock_op_for_serializable` was introduced in v23.2.0. Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>

(This is follow-up from #127718, basically a refinement of commit e74202c.) When using the new Lock operator to implement FOR UPDATE and FOR SHARE, we build unlocked scans that are then followed by a final Lock operator. If the Lock operator uses NOWAIT, however, similar to SKIP LOCKED, these unlocked scans still need to have the NOWAIT behavior to avoid blocking, even if they do not take locks themselves. Unlike SKIP LOCKED, however, we do _not_ keep the NOWAIT behavior on unlocked scans under weaker isolation levels. This is because NOWAIT false positives produce false positive locking errors, whereas SKIP LOCKED false positives are benign (and in fact a minor performance optimization in some cases). For example, consider the following scenario: ```sql CREATE TABLE xy (x INT PRIMARY KEY, y INT); INSERT INTO xy VALUES (1, 1), (2, 2); -- lock the x=1 row in the first session BEGIN; SELECT * FROM xy WHERE x = 1 FOR UPDATE; -- try to lock the y=2 row in the second session BEGIN; SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT; ``` Under serializable isolation, the final SELECT FOR UPDATE NOWAIT statement uses NOWAIT behavior in its initial full-table scan. This full-table scan first encounters the lock on x=1 which is a "false positive" in the sense that it is not the intended row to lock. This causes the statement to fail immediately. This might seem surprising, but under serializable isolation our only alternative is to instead block on the x=1 lock until the first transaction finishes. We choose to fail immediately (the NOWAIT behavior) instead of blocking because (a) it matches our behavior before this change, (b) it conceptually fits with the read-lock semantics of serializable isolation, in which we consider every read to acquire a shared lock, and (c) it preserves the expected serializable behavior in case the first transaction updates the x=1 row to have y=2. Under weaker isolation levels we do not use NOWAIT in the initial full-table scan to avoid this false positive error, instead reading "underneath" the lock and only trying to lock the y=2 row. If, instead of NOWAIT, we consider the final statement to use SKIP LOCKED, we can see that it doesn't hurt anything for us to always use the SKIP LOCKED behavior in the initial full table scan, even under weaker isolation levels. And sometimes it helps. Fixes: #129145 Release note (bug fix): Fix a bug in which some SELECT FOR UPDATE or SELECT FOR SHARE queries using NOWAIT could still block on locked rows when using `optimizer_use_lock_op_for_serializable` under Serializable isolation. This bug was present when `optimizer_use_lock_op_for_serializable` was introduced in v23.2.0.

michae2 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team T-sql-queries SQL Queries Team A-read-committed Related to the introduction of Read Committed labels Aug 16, 2024

This comment was marked as resolved.

Sign in to view

michae2 added branch-master Failures and bugs on the master branch. branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 labels Aug 16, 2024

exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label Aug 22, 2024

michae2 removed the T-kv KV Team label Sep 3, 2024

michae2 self-assigned this Sep 3, 2024

exalate-issue-sync bot unassigned michae2 Sep 3, 2024

michae2 self-assigned this Sep 4, 2024

michae2 mentioned this issue Sep 4, 2024

opt: add NOWAIT to unlocked scans below NOWAIT #130103

Merged

craig bot closed this as completed in 544d2c6 Sep 10, 2024

blathers-crl bot mentioned this issue Sep 10, 2024

release-24.1: opt: add NOWAIT to unlocked scans below NOWAIT #130429

Open

blathers-crl bot mentioned this issue Sep 10, 2024

release-24.2: opt: add NOWAIT to unlocked scans below NOWAIT #130430

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql, kv: FOR UPDATE NOWAIT can starve or block under serializable #129145

sql, kv: FOR UPDATE NOWAIT can starve or block under serializable #129145

michae2 commented Aug 16, 2024 •

edited by exalate-issue-sync bot

Loading

This comment was marked as resolved.

michae2 commented Aug 16, 2024

michae2 commented Aug 16, 2024

michae2 commented Sep 3, 2024

sql, kv: FOR UPDATE NOWAIT can starve or block under serializable #129145

sql, kv: FOR UPDATE NOWAIT can starve or block under serializable #129145

Comments

michae2 commented Aug 16, 2024 • edited by exalate-issue-sync bot Loading

This comment was marked as resolved.

michae2 commented Aug 16, 2024

michae2 commented Aug 16, 2024

michae2 commented Sep 3, 2024

michae2 commented Aug 16, 2024 •

edited by exalate-issue-sync bot

Loading