Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce DB transaction's order to prevent deadlocks #2619

Merged
merged 4 commits into from
Sep 18, 2019

Conversation

pasqu4le
Copy link
Contributor

@pasqu4le pasqu4le commented Aug 23, 2019

fixes: #2587

Motivation

Due to the ShareLock mechanism used by PostgreSQL when there are several DB transactions are acting on multiple rows of the same table, it's possible to incur in a deadlock and so into an error.

Changelog

This PR enforces a consistent order of ShareLock acquisition in all the relevant transaction.
It also adds a clear documentation that should help in the future.

Bug Fixes

This is the list of functions that had to be modified and for what reason:

Updates that were performed without ordering:

  • Explorer.Chain.find_and_update_replaced_transactions
  • Explorer.Chain.update_replaced_transactions
  • Explorer.Chain.Import.Runner.Addresses.update_transactions
  • Explorer.Chain.Import.Runner.Blocks.update_internal_transaction_block_number
  • Explorer.Chain.Import.Runner.Blocks.fork_transactions
  • Explorer.Chain.Import.Runner.Blocks.lose_consensus
  • Explorer.Chain.Import.Runner.Blocks.lose_invalid_neighbour_consensus
  • Explorer.Chain.Import.Runner.Blocks.update_block_second_degree_relations
  • Indexer.Temporary.BlocksTransactionsMismatch.run_blocks

Updates that were incorrectly ordered (e.g. postgres does not respect the order of a list to match)

  • Explorer.Chain.Import.Runner.InternalTransactions.update_transactions
  • Explorer.Chain.Import.Runner.InternalTransactionsIndexedAtBlocks.update_blocks
  • Explorer.Chain.Import.Runner.Transactions.discard_blocks_for_recollated_transactions
  • Explorer.Chain.Import.Runner.Blocks.derive_transaction_forks

Deletes without order

  • Explorer.ChainSpec.Parity.Importer.import_emission_rewards
  • Explorer.Chain.Import.Runner.Blocks.delete_rewards

Ordering that did not exist at all (and the functions that now use them):

  • StakingPool - staking_address_hash
    • Explorer.Chain.Import.Runner.StakingPools.mark_as_deleted
    • Explorer.Chain.Import.Runner.StakingPools.insert
  • EmissionReward - block_range
    • Explorer.ChainSpec.Parity.Importer.import_emission_rewards
  • StakingPoolsDelegators delegator_address_hash, pool_address_hash
    • Explorer.Chain.Import.Runner.StakingPoolsDelegators.insert
  • MarketHistory - date
    • Explorer.Market.bulk_insert_history
  • Address.Name - address_hash
    • Explorer.Validator.MetadataImporter.import_metadata
  • ContractMethod - {identifier, abi}
    • Explorer.Chain.ContractMethod.upsert_from_abi

SQL.query calls where refactored for clarity and ordering was added where necessary:

  • Explorer.Chain.Import.Runner.Blocks.derive_transaction_forks
  • Explorer.Chain.Import.Runner.Blocks.derive_address_current_token_balances
  • Explorer.Chain.Import.Runner.Tokens.update_holder_counts_with_deltas

Modified ordering

These are orderings that were found to be inconsistent. All the functions that use them (including those corrected in the list above), have been changed to use the new ones.
The modified ones are:

  • for Block, from number, hash to hash
  • for Transaction.Fork, from uncle_hash, hash to uncle_hash, index
  • for Address.Name, from address_hash to address_hash, name

Checklist for your PR

@pasqu4le pasqu4le self-assigned this Aug 23, 2019
@pasqu4le pasqu4le force-pushed the pp-enforce-sharelock-order branch from 68be807 to 3d9002d Compare August 23, 2019 16:13
@coveralls
Copy link

coveralls commented Aug 23, 2019

Pull Request Test Coverage Report for Build 7fabe2db-da28-470b-8812-e9ac5852e5c5

  • 123 of 142 (86.62%) changed or added relevant lines in 20 files are covered.
  • 25 unchanged lines in 9 files lost coverage.
  • Overall coverage increased (+1.0%) to 79.419%

Changes Missing Coverage Covered Lines Changed/Added Lines %
apps/explorer/lib/explorer/chain.ex 10 12 83.33%
apps/explorer/lib/explorer/chain/import/runner/blocks.ex 60 62 96.77%
apps/explorer/lib/explorer/chain/import/runner/internal_transactions_indexed_at_blocks.ex 0 4 0.0%
apps/explorer/lib/explorer/chain_spec/poa/importer.ex 0 5 0.0%
apps/indexer/lib/indexer/temporary/blocks_transactions_mismatch.ex 0 6 0.0%
Files with Coverage Reduction New Missed Lines %
apps/block_scout_web/lib/block_scout_web/controllers/tokens/read_contract_controller.ex 1 85.71%
apps/block_scout_web/lib/block_scout_web/controllers/tokens/token_controller.ex 1 0.0%
apps/explorer/lib/explorer/chain/import.ex 1 84.48%
apps/explorer/lib/explorer/chain/import/runner/addresses.ex 1 84.62%
apps/block_scout_web/lib/block_scout_web/controllers/tokens/holder_controller.ex 2 88.89%
apps/block_scout_web/lib/block_scout_web/controllers/tokens/inventory_controller.ex 2 90.48%
apps/explorer/lib/explorer/chain/address.ex 2 93.94%
apps/block_scout_web/lib/block_scout_web/notifier.ex 3 94.34%
apps/block_scout_web/lib/block_scout_web/controllers/tokens/transfer_controller.ex 12 0.0%
Totals Coverage Status
Change from base Build 302c9711-6397-4c47-bfe6-94c328ae3500: 1.0%
Covered Lines: 32055
Relevant Lines: 40362

💛 - Coveralls

@pasqu4le pasqu4le force-pushed the pp-enforce-sharelock-order branch 4 times, most recently from bdcd40f to ea5795b Compare August 23, 2019 19:48
@pasqu4le pasqu4le added ready for review This PR is ready for reviews. and removed in progress labels Aug 23, 2019
Copy link
Member

@vbaranov vbaranov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 it is on stg server btw.

not is_nil(t.block_number)
)
end)
transactions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style comment. the only change in this expression is a pipeline operator. What do you think about leaving the original version since a pipeline operator is much more proper for multiple operations? IMHO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a comment that might be the cause of confusion, but following the original call to Enum.reduce there is more piping.

Copy link
Contributor

@ayrat555 ayrat555 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some formatting/style comments. looks good

@pasqu4le pasqu4le force-pushed the pp-enforce-sharelock-order branch from ea5795b to 6555b33 Compare August 26, 2019 09:37
@pasqu4le pasqu4le force-pushed the pp-enforce-sharelock-order branch 4 times, most recently from 4847469 to 3795658 Compare September 3, 2019 16:54
@vbaranov vbaranov requested review from vbaranov and removed request for saneery and Heimdell September 10, 2019 15:24
@pasqu4le pasqu4le force-pushed the pp-enforce-sharelock-order branch 2 times, most recently from bea2017 to 896ff9a Compare September 10, 2019 16:26
@pasqu4le pasqu4le requested a review from ayrat555 September 10, 2019 16:33
@pasqu4le pasqu4le force-pushed the pp-enforce-sharelock-order branch 4 times, most recently from 049de56 to cfeb3db Compare September 11, 2019 19:01
where_forked: where_forked
})
|> Multi.run(:acquire_blocks, fn repo, _ ->
acquire_blocks(repo, hashes, consensus_block_numbers, where_invalid_neighbour)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this step? it just fetches blocks but this data is not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it does only that, the reason is that it acquires all the blocks that need to be modified. If we acquire locks in two different steps of the transactions there is no way to guarantee they are in order, and since they are in the same DB transaction the locks are never released.

For instance if lose_consensus acquires blocks 10 to 15 and later on lose_invalid_neighbour_consensus acquires block 8 the order is wrong.
Instead they are all acquired here and later each step can modify them as it sees fit.

I added a note in the documentation file on ShareLocks, maybe it needs to be explained better?

end)
|> Enum.sort()
|> Enum.filter(& &1.consensus)
|> Enum.map(& &1.number)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to sort the result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need, it is only used as part of queries with the proper order_by

{_, result} =
repo.update_all(
from(block in Block, where: block.number in ^consensus_block_number, select: block.number),
[set: [consensus: false, updated_at: updated_at]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why compilation doesn't fail because updated_at is not injected with ^? 🤔

defp remove_nonconsensus_logs(repo, nonconsensus_block_numbers, %{timeout: timeout}) do
defp remove_nonconsensus_logs(repo, nonconsensus_block_numbers, forked_transactions, %{timeout: timeout}) do
forked_transaction_hashes = Enum.map(forked_transactions, & &1.hash)

transaction_query =
from(transaction in Transaction,
where: transaction.block_number in ^nonconsensus_block_numbers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this step won't be necessary because forked transactions won't have block_number. it's deleted by fork_transactions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the hash is not, and it is by the hash that we find the Logs to remove.
We need to remove the logs of forked transactions, if you try to remove this line (and the or_where in the query) the test you wrote will fail, for instance:
test/explorer/chain/import/runner/blocks_test.exs:151

|> Multi.run(:internal_transaction_transaction_block_number, fn repo, %{blocks: blocks} when is_list(blocks) ->
update_internal_transaction_block_number(repo, hashes)
end)
|> Multi.run(:acquire_contract_address_tokens, fn repo, _ ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move acuire... steps to the steps that require them so they'll go together and won't be accidentally deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do that, tables need to be acquired in a specific order, the update can happen later, but the locks need to be taken here (see docs)

})
end)
|> Multi.run(:remove_nonconsensus_logs, fn repo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this step before fork_transactions? so transactions still have block_number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, see comments on order above. Logs need to follow Transaction.Fork

|> Multi.run(:acquire_contract_address_tokens, fn repo, _ ->
acquire_contract_address_tokens(repo, consensus_block_numbers)
end)
|> Multi.run(:remove_nonconsensus_token_transfers, fn repo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this step before fork_transactions? so transactions still have block_number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code for remove_nonconsensus_token_transfers does not make use of the transactions table, this should fine where it is, right?

@ayrat555
Copy link
Contributor

@pasqu4le fixes are very fragile. One query can break it all.

Can we create some kind of common module that will be reused across all queries?

|> Enum.map(fn %{number: number} -> number end)
|> Enum.sort()
when is_list(changes_list) do
block_numbers = Enum.map(changes_list, fn %{number: number} -> number end)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't we sorting them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it makes no difference, the sorting needed is done by the query just below, with its order_by

@pasqu4le
Copy link
Contributor Author

@ayrat555 you are right, it is very fragile and a single query could break it.

But I don't know of any other way to enforce the order between tables.
BTW the Block runner is the only one that really is problematic on its own, because it modifies data on a lot of different tables, the other ones are all much more stable, but they still are a bit fragile in that they need to be in a valid Stage.

The only way to have this work generally would require a complete change of the whole Stages/Runners logic and implementation. I actually agree with it, but I'd say it is better if this is done later on.

@pasqu4le pasqu4le force-pushed the pp-enforce-sharelock-order branch from cfeb3db to b62d75d Compare September 12, 2019 10:40
@pasqu4le
Copy link
Contributor Author

I had forgot to update the order between tables in the docs, but it is corrected now

Problem: due to the ShareLock mechanism used by PostgreSQL when there are several DB transactions are acting on multiple rows of the same table, it's possible to incur in a deadlock and so into an error.

Solution: enforce a consistent order of ShareLock acquisition in all the relevant transaction.
Also add clear documentation that should help in the future.
@pasqu4le pasqu4le force-pushed the pp-enforce-sharelock-order branch from b62d75d to 029deae Compare September 16, 2019 17:30
@pasqu4le pasqu4le force-pushed the pp-enforce-sharelock-order branch from 029deae to 810dc48 Compare September 16, 2019 17:50
@vbaranov vbaranov merged commit c8531fa into master Sep 18, 2019
@vbaranov vbaranov deleted the pp-enforce-sharelock-order branch September 18, 2019 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for review This PR is ready for reviews.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(Postgrex.Error) ERROR 40P01 (deadlock_detected) deadlock detected
4 participants