Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: add NOWAIT to unlocked scans below NOWAIT #130103

Merged
merged 1 commit into from
Sep 10, 2024

Conversation

michae2
Copy link
Collaborator

@michae2 michae2 commented Sep 4, 2024

(This is follow-up from #127718, basically a refinement of commit e74202c.)

When using the new Lock operator to implement FOR UPDATE and FOR SHARE, we build unlocked scans that are then followed by a final Lock operator. If the Lock operator uses NOWAIT, however, similar to SKIP LOCKED, these unlocked scans still need to have the NOWAIT behavior to avoid blocking, even if they do not take locks themselves.

Unlike SKIP LOCKED, however, we do not keep the NOWAIT behavior on unlocked scans under weaker isolation levels. This is because NOWAIT false positives produce false positive locking errors, whereas SKIP LOCKED false positives are benign (and in fact a minor performance optimization in some cases).

For example, consider the following scenario:

CREATE TABLE xy (x INT PRIMARY KEY, y INT);
INSERT INTO xy VALUES (1, 1), (2, 2);

-- lock the x=1 row in the first session
BEGIN;
SELECT * FROM xy WHERE x = 1 FOR UPDATE;

-- try to lock the y=2 row in the second session
BEGIN;
SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;

Under serializable isolation, the final SELECT FOR UPDATE NOWAIT statement uses NOWAIT behavior in its initial full-table scan. This full-table scan first encounters the lock on x=1 which is a "false positive" in the sense that it is not the intended row to lock. This causes the statement to fail immediately. This might seem surprising, but under serializable isolation our only alternative is to instead block on the x=1 lock until the first transaction finishes. We choose to fail immediately (the NOWAIT behavior) instead of blocking because (a) it matches our behavior before this change, (b) it conceptually fits with the read-lock semantics of serializable isolation, in which we consider every read to acquire a shared lock, and (c) it preserves the expected serializable behavior in case the first transaction updates the x=1 row to have y=2.

Under weaker isolation levels we do not use NOWAIT in the initial full-table scan to avoid this false positive error, instead reading "underneath" the lock and only trying to lock the y=2 row.

If, instead of NOWAIT, we consider the final statement to use SKIP LOCKED, we can see that it doesn't hurt anything for us to always use the SKIP LOCKED behavior in the initial full table scan, even under weaker isolation levels. And sometimes it helps.

Fixes: #129145

Release note (bug fix): Fix a bug in which some SELECT FOR UPDATE or SELECT FOR SHARE queries using NOWAIT could still block on locked rows when using optimizer_use_lock_op_for_serializable under Serializable isolation. This bug was present when optimizer_use_lock_op_for_serializable was introduced in v23.2.0.

@michae2 michae2 added backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 labels Sep 4, 2024
@michae2 michae2 requested review from DrewKimball, mw5h and a team September 4, 2024 18:33
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@michae2 michae2 removed the backport-23.2.x Flags PRs that need to be backported to 23.2. label Sep 4, 2024
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice fix!

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @mw5h)


pkg/sql/opt/optbuilder/select.go line 548 at r1 (raw file):

					WaitPolicy: waitPolicy,
				},
			}}

[nit] Now that this logic has become fairly complex, do you think it's worth pulling it out into a separate method that can be called here and above?

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! :lgtm:

Reviewed 1 of 3 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2 and @mw5h)


-- commits line 15 at r2:
Where can I learn more about "NOWAIT false positives"?

(This is follow-up from cockroachdb#127718, basically a refinement of commit
e74202c.)

When using the new Lock operator to implement FOR UPDATE and FOR SHARE,
we build unlocked scans that are then followed by a final Lock
operator. If the Lock operator uses NOWAIT, however, similar to SKIP
LOCKED, these unlocked scans still need to have the NOWAIT behavior to
avoid blocking, even if they do not take locks themselves.

Unlike SKIP LOCKED, however, we do _not_ keep the NOWAIT behavior on
unlocked scans under weaker isolation levels. This is because NOWAIT
false positives produce false positive locking errors, whereas SKIP
LOCKED false positives are benign (and in fact a minor performance
optimization in some cases).

For example, consider the following scenario:

```sql
CREATE TABLE xy (x INT PRIMARY KEY, y INT);
INSERT INTO xy VALUES (1, 1), (2, 2);

-- lock the x=1 row in the first session
BEGIN;
SELECT * FROM xy WHERE x = 1 FOR UPDATE;

-- try to lock the y=2 row in the second session
BEGIN;
SELECT * FROM xy WHERE y = 2 FOR UPDATE NOWAIT;
```

Under serializable isolation, the final SELECT FOR UPDATE NOWAIT
statement uses NOWAIT behavior in its initial full-table scan. This
full-table scan first encounters the lock on x=1 which is a "false
positive" in the sense that it is not the intended row to lock. This
causes the statement to fail immediately. This might seem surprising,
but under serializable isolation our only alternative is to instead
block on the x=1 lock until the first transaction finishes. We choose to
fail immediately (the NOWAIT behavior) instead of blocking because (a)
it matches our behavior before this change, (b) it conceptually fits
with the read-lock semantics of serializable isolation, in which we
consider every read to acquire a shared lock, and (c) it preserves the
expected serializable behavior in case the first transaction updates the
x=1 row to have y=2.

Under weaker isolation levels we do not use NOWAIT in the initial
full-table scan to avoid this false positive error, instead reading
"underneath" the lock and only trying to lock the y=2 row.

If, instead of NOWAIT, we consider the final statement to use SKIP
LOCKED, we can see that it doesn't hurt anything for us to always use
the SKIP LOCKED behavior in the initial full table scan, even under
weaker isolation levels. And sometimes it helps.

Fixes: cockroachdb#129145

Release note (bug fix): Fix a bug in which some SELECT FOR UPDATE or
SELECT FOR SHARE queries using NOWAIT could still block on locked rows
when using `optimizer_use_lock_op_for_serializable` under Serializable
isolation. This bug was present when
`optimizer_use_lock_op_for_serializable` was introduced in v23.2.0.
Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! Test failure is unrelated.

bors r=DrewKimball,mgartner

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @mw5h)


-- commits line 15 at r2:

Previously, mgartner (Marcus Gartner) wrote…

Where can I learn more about "NOWAIT false positives"?

Ah, sorry, I didn't explain this clearly. I've added to the commit message, hopefully it's more understandable now.


pkg/sql/opt/optbuilder/select.go line 548 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Now that this logic has become fairly complex, do you think it's worth pulling it out into a separate method that can be called here and above?

I thought you would say this. 🙂 Done.

@craig craig bot merged commit 8da10b0 into cockroachdb:master Sep 10, 2024
22 of 23 checks passed
@michae2 michae2 deleted the b129145 branch September 10, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql, kv: FOR UPDATE NOWAIT can starve or block under serializable
4 participants