-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce DB transaction's order to prevent deadlocks #2619
Conversation
68be807
to
3d9002d
Compare
bdcd40f
to
ea5795b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 it is on stg server btw.
not is_nil(t.block_number) | ||
) | ||
end) | ||
transactions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style comment. the only change in this expression is a pipeline operator. What do you think about leaving the original version since a pipeline operator is much more proper for multiple operations? IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a comment that might be the cause of confusion, but following the original call to Enum.reduce
there is more piping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some formatting/style comments. looks good
ea5795b
to
6555b33
Compare
4847469
to
3795658
Compare
bea2017
to
896ff9a
Compare
049de56
to
cfeb3db
Compare
where_forked: where_forked | ||
}) | ||
|> Multi.run(:acquire_blocks, fn repo, _ -> | ||
acquire_blocks(repo, hashes, consensus_block_numbers, where_invalid_neighbour) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this step? it just fetches blocks but this data is not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it does only that, the reason is that it acquires all the blocks that need to be modified. If we acquire locks in two different steps of the transactions there is no way to guarantee they are in order, and since they are in the same DB transaction the locks are never released.
For instance if lose_consensus
acquires blocks 10
to 15
and later on lose_invalid_neighbour_consensus
acquires block 8
the order is wrong.
Instead they are all acquired here and later each step can modify them as it sees fit.
I added a note in the documentation file on ShareLocks, maybe it needs to be explained better?
end) | ||
|> Enum.sort() | ||
|> Enum.filter(& &1.consensus) | ||
|> Enum.map(& &1.number) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to sort the result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need, it is only used as part of queries with the proper order_by
{_, result} = | ||
repo.update_all( | ||
from(block in Block, where: block.number in ^consensus_block_number, select: block.number), | ||
[set: [consensus: false, updated_at: updated_at]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why compilation doesn't fail because updated_at
is not injected with ^
? 🤔
defp remove_nonconsensus_logs(repo, nonconsensus_block_numbers, %{timeout: timeout}) do | ||
defp remove_nonconsensus_logs(repo, nonconsensus_block_numbers, forked_transactions, %{timeout: timeout}) do | ||
forked_transaction_hashes = Enum.map(forked_transactions, & &1.hash) | ||
|
||
transaction_query = | ||
from(transaction in Transaction, | ||
where: transaction.block_number in ^nonconsensus_block_numbers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this step won't be necessary because forked transactions won't have block_number
. it's deleted by fork_transactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the hash
is not, and it is by the hash
that we find the Log
s to remove.
We need to remove the logs of forked transactions, if you try to remove this line (and the or_where
in the query) the test you wrote will fail, for instance:
test/explorer/chain/import/runner/blocks_test.exs:151
|> Multi.run(:internal_transaction_transaction_block_number, fn repo, %{blocks: blocks} when is_list(blocks) -> | ||
update_internal_transaction_block_number(repo, hashes) | ||
end) | ||
|> Multi.run(:acquire_contract_address_tokens, fn repo, _ -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move acuire...
steps to the steps that require them so they'll go together and won't be accidentally deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't do that, tables need to be acquired in a specific order, the update can happen later, but the locks need to be taken here (see docs)
}) | ||
end) | ||
|> Multi.run(:remove_nonconsensus_logs, fn repo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this step before fork_transactions
? so transactions
still have block_number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, see comments on order above. Logs
need to follow Transaction.Fork
|> Multi.run(:acquire_contract_address_tokens, fn repo, _ -> | ||
acquire_contract_address_tokens(repo, consensus_block_numbers) | ||
end) | ||
|> Multi.run(:remove_nonconsensus_token_transfers, fn repo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this step before fork_transactions
? so transactions
still have block_number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for remove_nonconsensus_token_transfers
does not make use of the transactions
table, this should fine where it is, right?
@pasqu4le fixes are very fragile. One query can break it all. Can we create some kind of common module that will be reused across all queries? |
|> Enum.map(fn %{number: number} -> number end) | ||
|> Enum.sort() | ||
when is_list(changes_list) do | ||
block_numbers = Enum.map(changes_list, fn %{number: number} -> number end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why aren't we sorting them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it makes no difference, the sorting needed is done by the query just below, with its order_by
@ayrat555 you are right, it is very fragile and a single query could break it. But I don't know of any other way to enforce the order between tables. The only way to have this work generally would require a complete change of the whole |
cfeb3db
to
b62d75d
Compare
I had forgot to update the order between tables in the docs, but it is corrected now |
Problem: due to the ShareLock mechanism used by PostgreSQL when there are several DB transactions are acting on multiple rows of the same table, it's possible to incur in a deadlock and so into an error. Solution: enforce a consistent order of ShareLock acquisition in all the relevant transaction. Also add clear documentation that should help in the future.
b62d75d
to
029deae
Compare
029deae
to
810dc48
Compare
fixes: #2587
Motivation
Due to the ShareLock mechanism used by PostgreSQL when there are several DB transactions are acting on multiple rows of the same table, it's possible to incur in a deadlock and so into an error.
Changelog
This PR enforces a consistent order of ShareLock acquisition in all the relevant transaction.
It also adds a clear documentation that should help in the future.
Bug Fixes
This is the list of functions that had to be modified and for what reason:
Updates that were performed without ordering:
Updates that were incorrectly ordered (e.g. postgres does not respect the order of a list to match)
Deletes without order
Ordering that did not exist at all (and the functions that now use them):
SQL.query calls where refactored for clarity and ordering was added where necessary:
Modified ordering
These are orderings that were found to be inconsistent. All the functions that use them (including those corrected in the list above), have been changed to use the new ones.
The modified ones are:
Block
, fromnumber, hash
tohash
Transaction.Fork
, fromuncle_hash, hash
touncle_hash, index
Address.Name
, fromaddress_hash
toaddress_hash, name
Checklist for your PR
CHANGELOG.md
with this PRmaster
in theVersion
column.