Skip to content
This repository has been archived by the owner on Jun 11, 2024. It is now read-only.

Achieve atomic block save #302

Closed
2 tasks done
karmacoma opened this issue Oct 29, 2016 · 14 comments
Closed
2 tasks done

Achieve atomic block save #302

karmacoma opened this issue Oct 29, 2016 · 14 comments
Milestone

Comments

@karmacoma
Copy link
Contributor

karmacoma commented Oct 29, 2016

Parent: #1169

If a node is interrupted during block processing i.e. due to reload, there is a high probability of partial writes to the database, leading to corruption of memory tables and a break in consensus with the rest of network.

Solution: Rewrite database logic so that transaction application, block save, and round tick are all encompassed within an atomic database transaction that can be easily rolled back upon interruption or error.

This should better ensure data integrity across nodes on the network.

Logic rewrite will also provide better use of shared connections from pool, and improve block processing capability greatly.

Current behavior

BEGIN DB TRANSACTION
  Apply unconfirmed transaction
COMMIT

BEGIN DB TRANSACTION
  Apply confirmed transaction
COMMIT

BEGIN DB TRANSACTION
  Insert block
  Insert transaction
  Insert transaction type
COMMIT

BEGIN DB TRANSACTION
  Round tick
COMMIT

Bad: In the event of rollback on error in later db transactions, memory tables will be inconsistent.

Proposed behavior

BEGIN DB TRANSACTION
  Apply unconfirmed transaction
  Apply confirmed transaction
  Insert block
  Insert transaction
  Insert transaction type
  Round tick
COMMIT

Good: In the event of rollback on error, memory tables will be consistent.

Code Target

An example where we are doing things correctly, but also at the heart of where our database writes start:

https://github.com/LiskHQ/lisk/blob/9598668a45e2f38f1f787b1bee393c39c2827066/modules/blocks/chain.js#L84-L109

Therefore, the actions: apply unconfirmed transaction, apply confirmed transaction and round ticks are those we need to consider.

Task List

@karmacoma karmacoma added this to the Mainchain Stabilisation milestone Oct 29, 2016
@karmacoma karmacoma self-assigned this Oct 29, 2016
@karmacoma karmacoma removed this from the Mainchain Stabilisation milestone Nov 30, 2016
@karmacoma
Copy link
Contributor Author

Work in progress. Will likely be included in the next minor release 0.6.0.

@karmacoma karmacoma added this to the Version 0.6.0 milestone Dec 12, 2016
@karmacoma karmacoma modified the milestones: Core Sprint 01, Core Sprint 02 Jan 19, 2017
@vitaly-t
Copy link
Contributor

vitaly-t commented Jan 26, 2017

Stuck with this? Need help? 😉

The pattern suggested in pg-promise-demo can be achieved gradually, step by step 😉

The first best step - moving all the SQL into the same kind of structure as in the demo.

@karmacoma
Copy link
Contributor Author

Thanks @vitaly-t. Your help and advice is always appreciated 👍

@karmacoma
Copy link
Contributor Author

Should largely be resolved after: #543. We will then come back and review best practices.

@karmacoma
Copy link
Contributor Author

Closed by #543

@karmacoma karmacoma reopened this Dec 20, 2017
@karmacoma
Copy link
Contributor Author

Reopened by #1169

@karmacoma
Copy link
Contributor Author

See: https://github.com/vitaly-t/pg-promise/wiki/Chaining-Queries for reasoning, especially when speaking of poor connection usage.

@karmacoma
Copy link
Contributor Author

The transaction opening library.db.tx probably needs to move further up the stack in order to encompass the full set of operations: https://github.com/LiskHQ/lisk/blob/9598668a45e2f38f1f787b1bee393c39c2827066/modules/blocks/chain.js#L84-L87

@nazarhussain
Copy link
Contributor

Yes I will go through all and see what's the best way to do. There are two different things we should keep in mind:

  1. Acquiring a real physical connection to database frequently
  2. Acquiring an existing connection from pool for executing a query

First one we should totally avoid, for second one we should improve as much we can.

@karmacoma
Copy link
Contributor Author

  1. Acquiring an existing connection from pool for executing a query

I think can be achieved by downwards sharing of task and tx objects from a higher level in the call stack.

@karmacoma
Copy link
Contributor Author

karmacoma commented Dec 22, 2017

The main focal point is block processing where are database writes are.

However, in order to review the rest of application, mainly in terms of database reads, I scanned the 1.0.0 branch for existing database calls:

https://gist.github.com/karmacoma/80495431baa2a56c47298eeaa8cd2e8f

NOTE: This would change when we revert: #597 and our round logic is put back in the application layer by #1174.

@karmacoma
Copy link
Contributor Author

Since a long time, we have https://github.com/vitaly-t/pg-monitor configured for db event monitoring.

https://github.com/LiskHQ/lisk/blob/9598668a45e2f38f1f787b1bee393c39c2827066/config.json#L23-L25

By adding this event: http://vitaly-t.github.io/pg-promise/global.html#event:connect
By adding this event: http://vitaly-t.github.io/pg-promise/global.html#event:query
By adding this event: http://vitaly-t.github.io/pg-promise/global.html#event:disconnect

We can reveal the problem we are speaking of more clearly:

  • Memory table update (transaction application - mem_accounts, mem_round):
22:16:10 connect(Oliver@lisk_test)
22:16:10 update "mem_accounts" set "balance" = "balance" + 100000000, "u_balance" = "u_balance" + 100000000, "blockId" = '1309855340732829944' where "address" = '16313739661670634666L';INSERT INTO mem_round ("address", "amount", "delegate", "blockId", "round") SELECT '16313739661670634666L', (100000000)::bigint, "dependentId", '1309855340732829944', 1 FROM mem_accounts2delegates WHERE "accountId" = '16313739661670634666L';
22:16:10 disconnect(Oliver@lisk_test)
  • Block + transaction insertion:
22:16:10 connect(Oliver@lisk_test)
22:16:10 tx: begin
22:16:10 tx: INSERT INTO "blocks"("id","version","timestamp","height","previousBlock","numberOfTransactions","totalAmount","totalFee","reward","payloadLength","payloadHash","generatorPublicKey","blockSignature") VALUES ('1309855340732829944',0,49868170,11,'8952117713527584814',1,100000000,10000000,0,117,'\x2d7ced9bd5961e2dbf74ac461f9902e3c24573e01efcc240ad1e53bf9e5ef10e','\x2f9b9a43b915bb8dcea45ea3b8552ebec202eb196a7889c2495d948e15f4a724','\xe53906b30a37ae61283ba041d96f0e6ad240bd29191c304c19267a7b9dd285d850ea01cf709f6e9fd795c34ac8888651e7a12dce3678f1847d167d55456d3a06')
22:16:10 tx: INSERT INTO "trs"("id","blockId","type","timestamp","senderPublicKey","requesterPublicKey","senderId","recipientId","amount","fee","signature","signSignature","signatures") VALUES ('3251201825196309549','1309855340732829944',0,49868167,'\xc094ebee7ec0c50ebee32918655e089f6e1a604b83bcaa760293c61e0f18ab6f',null,'16313739661670634666L','16313739661670634666L',100000000,10000000,'\xf3eba2756fbb30679acc52b6de2358276864ffa4e74d090e5a5b2620eb62b5fc6b8cfc4aa47eed7be47cba3e1efae8919ac0073a52b64f85f8e1bbe0a2d16106',null,null)
22:16:10 tx: commit
22:16:10 disconnect(Oliver@lisk_test)
  • Round tick:
22:16:10 tx(Tick): begin
22:16:10 disconnect(Oliver@lisk_test)
22:16:10 tx(Tick): update "mem_accounts" set "blockId" = '1309855340732829944', "producedblocks" = "producedblocks" + 1 where "address" = '5380829552614149409L';
22:16:10 tx(Tick): commit
22:16:10 disconnect(Oliver@lisk_test)
22:16:10 connect(Oliver@lisk_test)

Our parent (Crypti) only had limited support for db transactions, no db connection pooling or sharing, and no query chaining, therefore what you see here are the results of the first iteration of the work done thus far. If we look at it optimistically, the situation is only bad from point of view of no shared database transaction between these discrete actions. This issue represents the next step in adhering to the best practices more completely.

@nazarhussain
Copy link
Contributor

@karmacoma #302 (comment) Yes that can be achieved. But that tx object needed to pass to every level module->logic->db. I am working on it to see how to refactor it more elegantly.

@karmacoma karmacoma added the child label Jan 4, 2018
@karmacoma karmacoma changed the title Rewrite database logic to follow pg-promise best practices Achieve atomic block save Jan 15, 2018
@karmacoma karmacoma removed the rewrite label Jan 15, 2018
@karmacoma karmacoma added this to the Version 1.0.0 milestone Feb 9, 2018
@karmacoma
Copy link
Contributor Author

Closed by PR: #1196 and #1552

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants