-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for lock existence before attempting tx execution. #6129
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still thinking about removing shared lock. Left some comments on other parts of the change.
@@ -1024,6 +1029,9 @@ impl AuthorityState { | |||
let (gas_status, input_objects) = | |||
transaction_input_checker::check_certificate_input(&self.database, certificate).await?; | |||
|
|||
let owned_object_refs = input_objects.filter_owned_objects(); | |||
self.check_owned_locks(&owned_object_refs).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do transactions set None locks or write output objects first? TransactionManager publishes transactions with input objects, but not checking corresponding locks. Maybe worth a comment on what is required for this check to be success after TransactionManager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We write objects first, then set None locks, which is the source of the race I am fixing here.
We don't strictly need this check here for transactions executed from TransactionManager, because it already verifies this now. But we could still hit the race from the authority server path. I hope to clean up and simplify a lot of this once we are able to remove the sequence tables.
missing.push(ObjectKey::from(objref)); | ||
} | ||
} | ||
}; | ||
} | ||
|
||
if !probe_lock_exists.is_empty() { | ||
match self.lock_service.locks_exist(probe_lock_exists).await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to use lock service in get_missing_input_objects()
? Can the validation in lock service be done on the missing objects after get_missing_input_objects()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if the lock does not yet exist, the transaction can't be executed yet, so effectively it is the same as if the object didn't exist.
Can the validation in lock service be done on the missing objects after get_missing_input_objects()?
I don't think I understand this question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically I want to keep the semantics of get_missing_input_objects() simple. Intuitively missing locks are different from missing objects. If sui should guarantee that lock exists if object exists, treating lock not exists as a missing input object can complicate other parts of the system. For example, missing input objects can only be notified when the object commits. Is there the same guarantee for locks?
I know we have talked about the order of writing locks vs objects after a transaction is finished. If you think missing locks are expected behavior, and treating them as missing input objects are correct (they will be eventually notified), can you leave a detailed comment in the code?
The 2nd question is that to keep the logic simple, it seems better to call get_missing_input_objects()
first, then scan objects that are not missing for locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that objects exist notifications are sent after writing locks. So I think what we need here is a comment why locks may not exist, and after locks are written TransactionManager will get notified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it - left a comment!
379afbe
to
0003ad1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good overall, after test failures are fixed.
For setting and removing shared locks, I believe shared locks now must exist when trying to remove them, which is great. We can also add back the logic to skip writing shared locks when effects exist, in both functions that acquire shared locks.
0003ad1
to
f6ecb68
Compare
f6ecb68
to
3f12d8d
Compare
…)" This reverts commit 616d228.
This fixes a race where this error could trigger:
https://github.com/MystenLabs/sui/blob/main/crates/sui-storage/src/lock_service.rs#L185-L188
The race, briefly explained:
Also included are changes I used to repro the problem in the simulator.