From 37e180eced8fa63c456c3ae272da29213d39da8e Mon Sep 17 00:00:00 2001 From: Erik de Castro Lopo Date: Thu, 24 Jun 2021 12:55:12 +1000 Subject: [PATCH] Add doc/ledger-consensus-api.md --- doc/ledger-consensus-api.md | 88 +++++++++++++++++++++++++++++++++++++ 1 file changed, 88 insertions(+) create mode 100644 doc/ledger-consensus-api.md diff --git a/doc/ledger-consensus-api.md b/doc/ledger-consensus-api.md new file mode 100644 index 000000000..56c08ac95 --- /dev/null +++ b/doc/ledger-consensus-api.md @@ -0,0 +1,88 @@ +# An Ideal API for Consensus and the Ledger + +This is written from the point of view of a consumer (ie `cardano-db-sync`) of data from the +consensus and ledger layers. It describes the problems I, as the main dev on `db-sync`, see +in what we have now and then gives top level details of how I think consensus and ledger +should provide to the consumer like `db-sync`. + +### Problems with What we Have Now + +Consensus (along with the networking code) and ledger-specs are developed in two separate +Git repositories and neither has a well thought out or evolved API. Instead they simply expose +their internals. The different era's (Shelley, Allegra, Mary etc) all have their own data +types which are unified using type families. Unfortunately, with type families, changes in the +library can cause particularly obtuse error messages in client code. However useful type +families might be for developing the code for the different eras, having to deal with the +different types, even when unified using type families, only makes things more difficult for +clients like `db-sync`. + +In order to populate it's database `db-sync` currently requires: + +* A `LocalChainSync` client to get blocks from the blockchain. +* A ledger state to get information that is not recorded on chain (rewards, stake distribution + etc). +* A ledger state query to get block information that is not on the blockchain (block time stamp, + epoch number, slot within epoch etc). +* Data obtained as an aggregation query on data already in the database (sum of tx inputs, + deposit value in a transaction etc). + +For the case of adatabase aggregation queries, it is not a problem for things like populating the +epoch table, but is a significant performance hit when calculating things like the deposit amount +for every transaction. + +It should also be noted that because `db-sync` has to insert data into PostgreSQL, it will likely +be the first Cardano component to hit issues where its performance cannot keep up with a new +block arriving every 20 seconds. Anything that can reduce the amount of computation `db-sync` +needs to do improves its performance. + +To sumarize, the problems with the current approach: + +* There are four different mechanisms to get all the information needed by the database. +* `db-sync` needs to maintain a copy of ledger state that is identical to the copy of the + ledger state in the `node`. With `db-sync`, that means applying a block to an existing ledger + state is done twice; once in the `node` and once in `db-sync`. The amount of code required to + maintain ledger state is significant and basically duplicates code in `consensus`. +* Some data that goes into the database (eg the `deposit` field of the `tx` table) must be + calculated using a database query. This needs to be done for every transaction in every + block and is not a cheap operation. + + +### What Data is Needed? + +The data needed by `db-sync` but not actually part of the blockchain (not necessarily an +exhaustive list): + +* UTC time stamp for each block (currently calculated with a local state query). +* Epoch number and slot number within an epoch (currently calculated with a local state query). +* The rewards for each epoch (extracted from the ledger state). +* The stake distribution for each epoch (extracted from the ledger state). +* The deposit value for each transaction (requires a database query). + + +### An Ideal API for a `db-sync`-like Consumer + +The ideal API for a `db-sync` like consumer would not require the consumer to maintain its +own ledger state. Instead, the API would provide two things: + +* Enhanced or annotated blocks (these are not blocks as they would appear on the blockchain), + with addition information like a UTC time stamp, current era, epoch number, slot within + an epoch etc. These annotated blocks are era independent. +* Enhanced/annotated blocks would not contain the blockchain version of blocks, but an enhanced + block with things like the deposit value for each transaction. Like the annotated blocks, + annotated transactions would also be era independent. +* Ledger events notifying of things that are difficult to obtain just by looking at the current + block. This would include things like: + + ``` + data LedgerEvent + = LedgerNewEra !Era + | LedgerNewEpoch !EpochNo + | LedgerRewards !EpochNo !Rewards + | LedgerStakeDist !StakeDist + | LedgerBlock !AnnotatedBlock + ``` + Rewards and stake distribution could be calculated incrementally by the ledger and the partial + results could be passed to `db-sync` as they become available. + +There would then be something like the `LocalChainSync` protocol that passes `LedgerEvent` +objects over the connection rather than the current blockchain version of the blocks.