Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql, kv: FOR UPDATE NOWAIT can starve or block under serializable #129145

Closed
michae2 opened this issue Aug 16, 2024 · 4 comments · Fixed by #130103
Closed

sql, kv: FOR UPDATE NOWAIT can starve or block under serializable #129145

michae2 opened this issue Aug 16, 2024 · 4 comments · Fixed by #130103
Assignees
Labels
A-read-committed Related to the introduction of Read Committed branch-master Failures and bugs on the master branch. branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@michae2
Copy link
Collaborator

michae2 commented Aug 16, 2024

Consider a SFU query like the following, using v24.2.0-rc.1:

CREATE TABLE xy (x INT PRIMARY KEY, y INT);
INSERT INTO xy VALUES (1, 1), (2, 2);

-- lock the x=1 row in the first session
BEGIN;
SELECT * FROM xy WHERE x = 1 FOR UPDATE;

-- try to lock the y=2 row in the second session
SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;

I think in this case, ideally the second SFU query would be able to lock the second row without blocking. But under serializable isolation, this currently does not happen. With default behavior (locking below the filter) the plan tries to lock all rows in xy, thus failing the NOWAIT check because of the lock on x=1:

demo@localhost:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
                            info
------------------------------------------------------------
  distribution: local
  vectorized: true

  • filter
  │ filter: y = 2
  │
  └── • scan
        missing stats
        table: xy@xy_pkey
        spans: FULL SCAN
        locking strength: for update
        locking wait policy: nowait

  index recommendations: 1
  1. type: index creation
     SQL command: CREATE INDEX ON defaultdb.public.xy (y);
(16 rows)

Time: 5ms total (execution 4ms / network 1ms)

demo@localhost:26257/demoapp/defaultdb> SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
ERROR: could not obtain lock on row (x)=(1) in xy@xy_pkey
SQLSTATE: 55P03

With the new optimizer_use_lock_op_for_serializable behavior (locking above the filter) the plan only tries to lock matching rows, but blocks when trying to read the x=1 row due to serializable isolation:

demo@localhost:26257/demoapp/defaultdb> SET optimizer_use_lock_op_for_serializable = on;
SET

Time: 1ms total (execution 1ms / network 0ms)

demo@localhost:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
                            info
------------------------------------------------------------
  distribution: local
  vectorized: true

  • lookup join (semi)
  │ table: xy@xy_pkey
  │ equality: (x) = (x)
  │ equality cols are key
  │ locking strength: for update
  │ locking wait policy: nowait
  │
  └── • filter
      │ filter: y = 2
      │
      └── • scan
            missing stats
            table: xy@xy_pkey
            spans: FULL SCAN

  index recommendations: 1
  1. type: index creation
     SQL command: CREATE INDEX ON defaultdb.public.xy (y);
(21 rows)

Time: 3ms total (execution 3ms / network 0ms)

demo@localhost:26257/demoapp/defaultdb> SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
(blocks)

It seems like neither behavior is desirable for NOWAIT.

Jira issue: CRDB-41396

Epic CRDB-40199

@michae2 michae2 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team T-sql-queries SQL Queries Team A-read-committed Related to the introduction of Read Committed labels Aug 16, 2024

This comment was marked as resolved.

@michae2 michae2 added branch-master Failures and bugs on the master branch. branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 labels Aug 16, 2024
@michae2
Copy link
Collaborator Author

michae2 commented Aug 16, 2024

Under read committed isolation this works fine, because the initial scan of xy can read underneath the lock:

demo@localhost:26257/demoapp/defaultdb> BEGIN ISOLATION LEVEL READ COMMITTED;
BEGIN

Time: 0ms total (execution 1ms / network 0ms)

demo@localhost:26257/demoapp/defaultdb  OPEN> EXPLAIN SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
                            info
------------------------------------------------------------
  distribution: local
  vectorized: true

  • lookup join (semi)
  │ table: xy@xy_pkey
  │ equality: (x) = (x)
  │ equality cols are key
  │ locking strength: for update
  │ locking wait policy: nowait
  │ locking durability: guaranteed
  │
  └── • filter
      │ filter: y = 2
      │
      └── • scan
            missing stats
            table: xy@xy_pkey
            spans: FULL SCAN

  index recommendations: 1
  1. type: index creation
     SQL command: CREATE INDEX ON defaultdb.public.xy (y);
(22 rows)

Time: 5ms total (execution 4ms / network 1ms)

demo@localhost:26257/demoapp/defaultdb  OPEN> SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
  x | y
----+----
  2 | 2
(1 row)

Time: 4ms total (execution 4ms / network 1ms)

@michae2
Copy link
Collaborator Author

michae2 commented Aug 16, 2024

For SKIP LOCKED we now flag the initial unlocked scans as skipped locked, as of e74202c. But this strategy won't work for NOWAIT, because we might skip over a lock that should cause the "could not obtain lock" error.

I think the only fix is to read underneath the lock in the initial scan, like we do for read committed. That's why I've tagged this as T-KV.

@exalate-issue-sync exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label Aug 22, 2024
@michae2
Copy link
Collaborator Author

michae2 commented Sep 3, 2024

After talking with @nvanbenschoten and @mw5h we think the initial unlocked read should be flagged with NOWAIT under serializable.

@michae2 michae2 removed the T-kv KV Team label Sep 3, 2024
@michae2 michae2 self-assigned this Sep 3, 2024
@michae2 michae2 self-assigned this Sep 4, 2024
michae2 added a commit to michae2/cockroach that referenced this issue Sep 4, 2024
(This is follow-up from cockroachdb#127718, basically a refinement of commit
e74202c.)

When using the new Lock operator to implement FOR UPDATE and FOR SHARE,
we build unlocked scans that are then followed by a final Lock
operator. If the Lock operator uses NOWAIT, however, similar to SKIP
LOCKED, these unlocked scans still need to have the NOWAIT behavior to
avoid blocking, even if they do not take locks themselves.

Unlike SKIP LOCKED, however, we do _not_ keep the NOWAIT behavior on
unlocked scans under weaker isolation levels. This is because NOWAIT
false positives produce false positive locking errors, whereas SKIP
LOCKED false positives are benign (and in fact a minor performance
optimization in some cases).

Fixes: cockroachdb#129145

Release note (bug fix): Fix a bug in which some SELECT FOR UPDATE or
SELECT FOR SHARE queries using NOWAIT could still block on locked rows
when using `optimizer_use_lock_op_for_serializable` under Serializable
isolation. This bug was present when
`optimizer_use_lock_op_for_serializable` was introduced in v23.2.0.
michae2 added a commit to michae2/cockroach that referenced this issue Sep 5, 2024
(This is follow-up from cockroachdb#127718, basically a refinement of commit
e74202c.)

When using the new Lock operator to implement FOR UPDATE and FOR SHARE,
we build unlocked scans that are then followed by a final Lock
operator. If the Lock operator uses NOWAIT, however, similar to SKIP
LOCKED, these unlocked scans still need to have the NOWAIT behavior to
avoid blocking, even if they do not take locks themselves.

Unlike SKIP LOCKED, however, we do _not_ keep the NOWAIT behavior on
unlocked scans under weaker isolation levels. This is because NOWAIT
false positives produce false positive locking errors, whereas SKIP
LOCKED false positives are benign (and in fact a minor performance
optimization in some cases).

Fixes: cockroachdb#129145

Release note (bug fix): Fix a bug in which some SELECT FOR UPDATE or
SELECT FOR SHARE queries using NOWAIT could still block on locked rows
when using `optimizer_use_lock_op_for_serializable` under Serializable
isolation. This bug was present when
`optimizer_use_lock_op_for_serializable` was introduced in v23.2.0.
michae2 added a commit to michae2/cockroach that referenced this issue Sep 10, 2024
(This is follow-up from cockroachdb#127718, basically a refinement of commit
e74202c.)

When using the new Lock operator to implement FOR UPDATE and FOR SHARE,
we build unlocked scans that are then followed by a final Lock
operator. If the Lock operator uses NOWAIT, however, similar to SKIP
LOCKED, these unlocked scans still need to have the NOWAIT behavior to
avoid blocking, even if they do not take locks themselves.

Unlike SKIP LOCKED, however, we do _not_ keep the NOWAIT behavior on
unlocked scans under weaker isolation levels. This is because NOWAIT
false positives produce false positive locking errors, whereas SKIP
LOCKED false positives are benign (and in fact a minor performance
optimization in some cases).

Fixes: cockroachdb#129145

Release note (bug fix): Fix a bug in which some SELECT FOR UPDATE or
SELECT FOR SHARE queries using NOWAIT could still block on locked rows
when using `optimizer_use_lock_op_for_serializable` under Serializable
isolation. This bug was present when
`optimizer_use_lock_op_for_serializable` was introduced in v23.2.0.
craig bot pushed a commit that referenced this issue Sep 10, 2024
130103: opt: add NOWAIT to unlocked scans below NOWAIT r=DrewKimball,mgartner a=michae2

(This is follow-up from #127718, basically a refinement of commit e74202c.)

When using the new Lock operator to implement FOR UPDATE and FOR SHARE, we build unlocked scans that are then followed by a final Lock operator. If the Lock operator uses NOWAIT, however, similar to SKIP LOCKED, these unlocked scans still need to have the NOWAIT behavior to avoid blocking, even if they do not take locks themselves.

Unlike SKIP LOCKED, however, we do _not_ keep the NOWAIT behavior on unlocked scans under weaker isolation levels. This is because NOWAIT false positives produce false positive locking errors, whereas SKIP LOCKED false positives are benign (and in fact a minor performance optimization in some cases).

For example, consider the following scenario:

```sql
CREATE TABLE xy (x INT PRIMARY KEY, y INT);
INSERT INTO xy VALUES (1, 1), (2, 2);

-- lock the x=1 row in the first session
BEGIN;
SELECT * FROM xy WHERE x = 1 FOR UPDATE;

-- try to lock the y=2 row in the second session
BEGIN;
SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
```

Under serializable isolation, the final SELECT FOR UPDATE NOWAIT statement uses NOWAIT behavior in its initial full-table scan. This full-table scan first encounters the lock on x=1 which is a "false positive" in the sense that it is not the intended row to lock. This causes the statement to fail immediately. This might seem surprising, but under serializable isolation our only alternative is to instead block on the x=1 lock until the first transaction finishes. We choose to fail immediately (the NOWAIT behavior) instead of blocking because (a) it matches our behavior before this change, (b) it conceptually fits with the read-lock semantics of serializable isolation, in which we consider every read to acquire a shared lock, and (c) it preserves the expected serializable behavior in case the first transaction updates the x=1 row to have y=2.

Under weaker isolation levels we do not use NOWAIT in the initial full-table scan to avoid this false positive error, instead reading "underneath" the lock and only trying to lock the y=2 row.

If, instead of NOWAIT, we consider the final statement to use SKIP LOCKED, we can see that it doesn't hurt anything for us to always use the SKIP LOCKED behavior in the initial full table scan, even under weaker isolation levels. And sometimes it helps.

Fixes: #129145

Release note (bug fix): Fix a bug in which some SELECT FOR UPDATE or SELECT FOR SHARE queries using NOWAIT could still block on locked rows when using `optimizer_use_lock_op_for_serializable` under Serializable isolation. This bug was present when `optimizer_use_lock_op_for_serializable` was introduced in v23.2.0.

Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
@craig craig bot closed this as completed in 544d2c6 Sep 10, 2024
blathers-crl bot pushed a commit that referenced this issue Sep 10, 2024
(This is follow-up from #127718, basically a refinement of commit
e74202c.)

When using the new Lock operator to implement FOR UPDATE and FOR SHARE,
we build unlocked scans that are then followed by a final Lock
operator. If the Lock operator uses NOWAIT, however, similar to SKIP
LOCKED, these unlocked scans still need to have the NOWAIT behavior to
avoid blocking, even if they do not take locks themselves.

Unlike SKIP LOCKED, however, we do _not_ keep the NOWAIT behavior on
unlocked scans under weaker isolation levels. This is because NOWAIT
false positives produce false positive locking errors, whereas SKIP
LOCKED false positives are benign (and in fact a minor performance
optimization in some cases).

For example, consider the following scenario:

```sql
CREATE TABLE xy (x INT PRIMARY KEY, y INT);
INSERT INTO xy VALUES (1, 1), (2, 2);

-- lock the x=1 row in the first session
BEGIN;
SELECT * FROM xy WHERE x = 1 FOR UPDATE;

-- try to lock the y=2 row in the second session
BEGIN;
SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
```

Under serializable isolation, the final SELECT FOR UPDATE NOWAIT
statement uses NOWAIT behavior in its initial full-table scan. This
full-table scan first encounters the lock on x=1 which is a "false
positive" in the sense that it is not the intended row to lock. This
causes the statement to fail immediately. This might seem surprising,
but under serializable isolation our only alternative is to instead
block on the x=1 lock until the first transaction finishes. We choose to
fail immediately (the NOWAIT behavior) instead of blocking because (a)
it matches our behavior before this change, (b) it conceptually fits
with the read-lock semantics of serializable isolation, in which we
consider every read to acquire a shared lock, and (c) it preserves the
expected serializable behavior in case the first transaction updates the
x=1 row to have y=2.

Under weaker isolation levels we do not use NOWAIT in the initial
full-table scan to avoid this false positive error, instead reading
"underneath" the lock and only trying to lock the y=2 row.

If, instead of NOWAIT, we consider the final statement to use SKIP
LOCKED, we can see that it doesn't hurt anything for us to always use
the SKIP LOCKED behavior in the initial full table scan, even under
weaker isolation levels. And sometimes it helps.

Fixes: #129145

Release note (bug fix): Fix a bug in which some SELECT FOR UPDATE or
SELECT FOR SHARE queries using NOWAIT could still block on locked rows
when using `optimizer_use_lock_op_for_serializable` under Serializable
isolation. This bug was present when
`optimizer_use_lock_op_for_serializable` was introduced in v23.2.0.
blathers-crl bot pushed a commit that referenced this issue Sep 10, 2024
(This is follow-up from #127718, basically a refinement of commit
e74202c.)

When using the new Lock operator to implement FOR UPDATE and FOR SHARE,
we build unlocked scans that are then followed by a final Lock
operator. If the Lock operator uses NOWAIT, however, similar to SKIP
LOCKED, these unlocked scans still need to have the NOWAIT behavior to
avoid blocking, even if they do not take locks themselves.

Unlike SKIP LOCKED, however, we do _not_ keep the NOWAIT behavior on
unlocked scans under weaker isolation levels. This is because NOWAIT
false positives produce false positive locking errors, whereas SKIP
LOCKED false positives are benign (and in fact a minor performance
optimization in some cases).

For example, consider the following scenario:

```sql
CREATE TABLE xy (x INT PRIMARY KEY, y INT);
INSERT INTO xy VALUES (1, 1), (2, 2);

-- lock the x=1 row in the first session
BEGIN;
SELECT * FROM xy WHERE x = 1 FOR UPDATE;

-- try to lock the y=2 row in the second session
BEGIN;
SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
```

Under serializable isolation, the final SELECT FOR UPDATE NOWAIT
statement uses NOWAIT behavior in its initial full-table scan. This
full-table scan first encounters the lock on x=1 which is a "false
positive" in the sense that it is not the intended row to lock. This
causes the statement to fail immediately. This might seem surprising,
but under serializable isolation our only alternative is to instead
block on the x=1 lock until the first transaction finishes. We choose to
fail immediately (the NOWAIT behavior) instead of blocking because (a)
it matches our behavior before this change, (b) it conceptually fits
with the read-lock semantics of serializable isolation, in which we
consider every read to acquire a shared lock, and (c) it preserves the
expected serializable behavior in case the first transaction updates the
x=1 row to have y=2.

Under weaker isolation levels we do not use NOWAIT in the initial
full-table scan to avoid this false positive error, instead reading
"underneath" the lock and only trying to lock the y=2 row.

If, instead of NOWAIT, we consider the final statement to use SKIP
LOCKED, we can see that it doesn't hurt anything for us to always use
the SKIP LOCKED behavior in the initial full table scan, even under
weaker isolation levels. And sometimes it helps.

Fixes: #129145

Release note (bug fix): Fix a bug in which some SELECT FOR UPDATE or
SELECT FOR SHARE queries using NOWAIT could still block on locked rows
when using `optimizer_use_lock_op_for_serializable` under Serializable
isolation. This bug was present when
`optimizer_use_lock_op_for_serializable` was introduced in v23.2.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-read-committed Related to the introduction of Read Committed branch-master Failures and bugs on the master branch. branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant