Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[console wallet] Transaction service sql db queries must handle DieselError(DatabaseError(__Unknown, "database is locked")) #4731

Closed
hansieodendaal opened this issue Sep 23, 2022 · 1 comment
Assignees

Comments

@hansieodendaal
Copy link
Contributor

See this related discussion as background information.

Proposal:

  • Combine find and update/write type queries into one (example fn reject_completed_transaction).
  • Add sql transactions around complex tasks (for example, the three sections of fn apply_encryption)
  • As a last resort, after implementing the first two suggestions, upon DieselError(DatabaseError(__Unknown, "database is locked")), retry the function.
@hansieodendaal
Copy link
Contributor Author

hansieodendaal commented Oct 7, 2022

With running code in #4775:

  • Performed a stress test using interactive transactions to peer wallets, 3x senders (16,000 each) on one computer to 4x receivers (12,000 each) on another computer to try and starve/lock-up the wallet db (as it can easily handle smaller batches) and stressing the network overall.
  • The wallets performed admirably:
    • sender 01: 10/16,000 rejected, 15,990/16,000 mined confirmed
    • sender 02: 4/16,000 unaccounted for, 15,996/16,000 mined confirmed
    • sender 03: 5/16,000 unaccounted for, 15,995/16,000 mined confirmed
    • receiver 01: 1/12,000 pending, 11,999/12,000 mined confirmed
    • receiver 02: 1/12,000 unaccounted for, 1/12,000 rejected, 11,998/12,000 mined confirmed
    • receiver 03: 3/12,000 unaccounted for, 1/12,000 pending, 11,996/12,000 mined confirmed
    • receiver 04: 1/12,000 unaccounted for, 1/12,000 pending, 11,992/12,000 mined confirmed
  • Many of the same DieselError(DatabaseError(__Unknown, "database is locked")) errors were observed but the protocols seemed to recover based on the stats^^:
    • Handling Service Request: UpdateOutputMetadataSignature
    • Transaction Broadcast protocol
  • Further improvements could possibly be made, especially if solved at a low level:
    • Example 1: db operation not retried again after "database is locked" (maybe this is ok as the operation is not mission critical?)
      2022-10-06 13:49:11.251508300 [wallet::output_manager_service] TRACE Handling Service Request: UpdateOutputMetadataSignature (eeab18ad469c3a41f3c9bf53fce1f2045fc31cf06632808baa119b147c73dc0c, 5fd10dffe7718838e55d826cf70f033993a7c2794a7715599e5c8a34dc40610e, e0dde4232ae3748b6c1605127599140725ea0457440b3cc1260b90ccdff82909)
      2022-10-06 13:49:11.251610600 [wallet::output_manager_service] WARN  Error handling request: OutputManagerStorageError(DieselError(DatabaseError(__Unknown, "database is locked")))
      2022-10-06 13:49:11.252925100 [wallet::transaction_service::protocols::receive_protocol] WARN  Could not update metadata signature (TxId: 2337743812186680647) for output 8c50a8b06e3a688a1269ab586e3ce5d023e89076f0f402cba54fad9a6112bb47 [OutputFeatures { version: V0, output_type: Standard, maturity: 0, metadata: [], sidechain_features: None }], Script: (Nop), Offset Pubkey: (ca218be3444becffb9baee347acffacabcbe0e3887d7876d44b3f047d8c07119), Metadata Signature: (5fd10dffe7718838e55d826cf70f033993a7c2794a7715599e5c8a34dc40610e, e0dde4232ae3748b6c1605127599140725ea0457440b3cc1260b90ccdff82909, eeab18ad469c3a41f3c9bf53fce1f2045fc31cf06632808baa119b147c73dc0c), Proof: 01a0a4b284fc3fe0..5960ca58d7831406 (2337743812186680647, Output manager error: `Output manager storage error: `Diesel error: `database is locked```)
    • Example 2: although this action is retried until success (slow loop), the db may be left in an incomplete state in between retries
       2022-10-06 13:49:22.683854700 [wallet::transaction_service::service] TRACE Transaction Broadcast protocol has ended with result Ok(Err(TransactionServiceProtocolError { id: TxId(18054193081190806108), error: TransactionStorageError(DieselError(DatabaseError(__Unknown, "database is locked"))) }))
       2022-10-06 13:49:22.683867000 [wallet::transaction_service::service] WARN  Error completing Transaction Broadcast Protocol (Id: 18054193081190806108): TransactionStorageError(DieselError(DatabaseError(__Unknown, "database is locked")))
       469944: 2022-10-06 13:57:41.879705500 [wallet::transaction_service::protocols::validation_protocol] WARN  Marking transaction 18054193081190806108 as unmined and confirmed 'false' with block 'false' (Operation ID: 1853808427834230318)

Repository owner moved this from In Progress to Done in Tari Esme Testnet Oct 7, 2022
stringhandler pushed a commit that referenced this issue Oct 11, 2022
Description
---
Transaction service sql db queries must handle  `DieselError(DatabaseError(__Unknown, "database is locked"))`. This PR attempts
  to remove situations where that error may occur under highly busy async cirumstances, specifically:
- Combine find and update/write type queries into one.
- Add sql transactions around complex tasks.

_**Note:** Partial resolution for #4731._

Motivation and Context
---
See above.

How Has This Been Tested?
---
- Passed unit tests.
- Passed cucumber tests.
- ~~**TODO:**~~ System level tests under stress conditions.
CjS77 added a commit that referenced this issue Oct 19, 2022
* fix: batch rewind operations (#4752)

Description
---
Split rewind DbTx into smaller pieces.

How Has This Been Tested?
---
I did rewind on 20000+ (empty) blocks.

* fix: fix config.toml bug (#4780)

Description
---
The base node errored when reading the `block_sync_trigger = 5` setting
```
ExitError { exit_code: ConfigError, details: Some("Invalid value for `base_node`: unknown field `block_sync_trigger`, expected one of `override_from`, `unconfirmed_pool`, `reorg_pool`, `service`") }
```

Motivation and Context
---
Reading default config settings should not cause an error

How Has This Been Tested?
---
System level testing

* fix(p2p/liveness): remove fallible unwrap (#4784)

Description
---
Removed stray unwrap in liveness service

Motivation and Context
---
Caused a base node to panic in stress test conditions.

```
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: DhtOutboundError(RequesterReplyChannelClosed)', base_layer\p2p\src\services\liveness\service.rs:164:71
```

How Has This Been Tested?
---
Tests pass

* fix(tari-script): use tari script encoding for execution stack serde de/serialization (#4791)

Description
---
- Uses tari script encoding (equivalent to consensus encoding) for `ExecutionStack` serde impl
- Rename as_bytes to to_bytes as per rust convention.
- adds migration to fix execution stack encoding in db

Motivation and Context
---
Resolves #4790 

How Has This Been Tested?
---
Added test to alert if breaking changes occur with serde serialization for execution stack.
Manual testing in progress

* feat: optimize transaction service queries (#4775)

Description
---
Transaction service sql db queries must handle  `DieselError(DatabaseError(__Unknown, "database is locked"))`. This PR attempts
  to remove situations where that error may occur under highly busy async cirumstances, specifically:
- Combine find and update/write type queries into one.
- Add sql transactions around complex tasks.

_**Note:** Partial resolution for #4731._

Motivation and Context
---
See above.

How Has This Been Tested?
---
- Passed unit tests.
- Passed cucumber tests.
- ~~**TODO:**~~ System level tests under stress conditions.

* feat: move nonce to first in sha hash (#4778)

Description
---

This moves the nonce to the front of the hashing order when hashing for the sha3 difficulty. 
This is done so that mining cannot cache part most the header and only load the nonce in. This forces the miner to hash the complete header each time the nonce chances. 

Motivation and Context
---

Fixes: #4767 

How Has This Been Tested?
---
Unit tests all pass.

* fix(dht): remove some invalid saf failure cases (#4787)

Description
---
- Ignores nanos for `stored_at` field in StoredMessages
- Uses direct u32 <-> i32 conversion
- Improve error message if attempting to store an expired message
- Discard expired messages immediately
- Debug log when remote client closes the connection in RPC server

Motivation and Context
---
- Nano conversion will fail when >= 2_000_000_000, nanos are not important to preserve so we ignore them (set to zero)
- u32 to/from i32 conversion does not lose any data as both are 32-bit, only used as i32 in the database 
- 'The message was not valid for store and forward' occurs if the message has expired, this PR uses a more descriptive error message for this specific case.
- Expired messages should be discarded immediately
- Early close "errors" on the rpc server simply indicate that the client went away, which is expected and not something that the server controls, and so is logged at debug level 

How Has This Been Tested?
---
Manually,

* v0.38.6

* fix(core): only resize db if migration is required (#4792)

Description
---
Adds conditional to only increase database size if migration is required

Motivation and Context
---
A new database (cucumber, functional tests) has no inputs and so migration is not required.
Ref #4791 

How Has This Been Tested?
---

* fix(miner): clippy error (#4793)

Description
---
Removes unused function in miner

Motivation and Context
---
Clippy

How Has This Been Tested?
---
No clippy error

* test: remove cucumber tests, simplify others (#4794)

Description
---
* remove auto update tests from cucumber
* rename some tests to be prefixed with `test_`
* simplified two cucumber tests by removing steps

Motivation and Context
---
The auto update tests have an external dependency, which makes it hard to test reliably. They were marked as broken, so I rather removed them.
There were two steps in the `list_height` and `list_headers` tests that created base nodes. Upon inspection of the logs, these base nodes never synced to the height of 5 and were  not checked in the test, so were pretty useless and just slowed the test down 

How Has This Been Tested?
---
npm test

* v0.38.7

* feat: add deepsource config

* fix(core): periodically commit large transaction in prune_to_height (#4805)

* fix(comms/rpc): measures client-side latency to first message received (#4817)

* fix(core): increase sync timeouts (#4800)

Co-authored-by: Cayle Sharrock <CjS77@users.noreply.github.com>

* feat: add multisig script that returns aggregate of signed public keys (#4742)

Description
---
Added an `m-of-n` multisig TariScript that returns the aggregate public key of the signatories if successful and fails otherwise. 

This is useful if the aggregate public key of the signatories is also the script public key, where signatories would work together to create an aggregate script signature using their individual script private keys.

Motivation and Context
---
To enhance the practicality of the  `m-of-n` multisig TariScript.

How Has This Been Tested?
---
Unit tests

Co-Authored-By: SW van Heerden swvheerden@gmail.com

* feat(comms): adds periodic socket-level liveness checks (#4819)

Description
---
- adds socket-level liveness checks
- adds configuration to enable liveness checks (currently enabled by default in base node, disabled in wallet)
- update status line to display liveness status

Motivation and Context
---
Allows us to gain visibility on the base latency of the transport without including overhead of the noise socket and yamux

How Has This Been Tested?
---
Manually

* fix(core): dont request full non-tip block if block is empty (#4802)

Description
---
- checks for edge-case which prevents an unnecessary full candidate block request when block is empty.

Motivation and Context
---
A full block request for empty block is not necessary as we already have all the information required to construct the candidate block. This check was missing from the branch where the candidate block is not the next tip block.

How Has This Been Tested?
---

Co-authored-by: Martin Stefcek <35243812+Cifko@users.noreply.github.com>
Co-authored-by: Hansie Odendaal <39146854+hansieodendaal@users.noreply.github.com>
Co-authored-by: SW van Heerden <swvheerden@gmail.com>
Co-authored-by: stringhandler <mikethetike@tari.com>
Co-authored-by: CjS77 <CjS77@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

1 participant