-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve aborted transactions cleanup #5669
Labels
Comments
spolitov
added a commit
that referenced
this issue
Sep 18, 2020
Summary: When a transaction is aborted its intents should be cleaned from participating tablets. And the transaction itself should be unloaded from memory. This diff fixes various scenarios when transactions were not cleaned: 1) Added a cleanup cache to transaction participant. So it will be able to clean up transaction when a cleanup request is received before the transaction was replicated to this node. 2) Fix state check for cleanup when the transaction heartbeat failed. 3) Clean up a transaction that failed to commit. 4) Attempt to clean up tablets that were not marked as having metadata, because in case of failure with child transaction we could write intents to them, while reporting overall operation as failed. 5) Clean up tablets involved in a child transaction when it fails. Test Plan: ybd --gtest_filter CqlIndexTest.TxnCleanup Reviewers: timur, mikhail Reviewed By: mikhail Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D9358
spolitov
added a commit
that referenced
this issue
Oct 7, 2020
…up aborted ones Summary: This diff adds a periodic status check of each running transaction to transaction participant. This is needed to detect transactions that have been aborted and abandoned more proactively. Such cases might happen when the transaction client has crashed, so that there is no one to send a cleanup RPC to the transaction participant. Previously, we would have to wait for a compaction for those transactions' intents to be cleaned up. The cleanup mechanism works as follows. Every running transaction now has an associated scheduled abort check hybrid time, abort_check_ht, which we set to start time + FLAGS_transaction_abort_check_interval_ms when the transaction starts. We keep resetting it to current time + the same interval FLAGS_transaction_abort_check_interval_ms when we receive a response saying the transaction is still pending. As a result of this, in the normal situation with no network disconnections or slowness, we check the status of each pending transaction once per FLAGS_transaction_abort_check_interval_ms milliseconds on average. In case of slow status request processing, we wait for the previous status request to time out (as per FLAGS_transaction_abort_check_timeout_ms flag) before scheduling a new status check for the same transaction. To efficiently implement the above polling mechanism, we use rpc::Poller and rpc::Scheduler to invoke a Poll function every FLAGS_transactions_status_poll_interval_ms milliseconds. This polling interval is much smaller the per-transaction status check interval FLAGS_transaction_abort_check_timeout_ms. This function uses the new sequential index on abort_check_ht that is being added to the transactions_ multi-index container in TransactionParticipant to obtain the set of transactions that are due for status check at this iteration. Also, in this diff we extract the code in TransactionParticipant that loads transaction metadata from intents RocksDB and large-transaction "apply metadata" from regular RocksDB into memory to a new class TransactionLoader. Test Plan: ybd --gtest_filter CqlIndexTest.TxnPollCleanup Reviewers: mikhail Reviewed By: mikhail Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D9427
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
No description provided.
The text was updated successfully, but these errors were encountered: