From 37e180eced8fa63c456c3ae272da29213d39da8e Mon Sep 17 00:00:00 2001
From: Erik de Castro Lopo <erikd@mega-nerd.com>
Date: Thu, 24 Jun 2021 12:55:12 +1000
Subject: [PATCH] Add doc/ledger-consensus-api.md

---
 doc/ledger-consensus-api.md | 88 +++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100644 doc/ledger-consensus-api.md

diff --git a/doc/ledger-consensus-api.md b/doc/ledger-consensus-api.md
new file mode 100644
index 000000000..56c08ac95
--- /dev/null
+++ b/doc/ledger-consensus-api.md
@@ -0,0 +1,88 @@
+# An Ideal API for Consensus and the Ledger
+
+This is written from the point of view of a consumer (ie `cardano-db-sync`) of data from the
+consensus and ledger layers. It describes the problems I, as the main dev on `db-sync`, see
+in what we have now and then gives top level details of how I think consensus and ledger
+should provide to the consumer like `db-sync`.
+
+### Problems with What we Have Now
+
+Consensus (along with the networking code) and ledger-specs are developed in two separate
+Git repositories and neither has a well thought out or evolved API. Instead they simply expose
+their internals. The different era's (Shelley, Allegra, Mary etc) all have their own data
+types which are unified using type families. Unfortunately, with type families, changes in the
+library can cause particularly obtuse error messages in client code. However useful type
+families might be for developing the code for the different eras, having to deal with the
+different types, even when unified using type families, only makes things more difficult for
+clients like `db-sync`.
+
+In order to populate it's database `db-sync` currently requires:
+
+* A `LocalChainSync` client to get blocks from the blockchain.
+* A ledger state to get information that is not recorded on chain (rewards, stake distribution
+  etc).
+* A ledger state query to get block information that is not on the blockchain (block time stamp,
+  epoch number, slot within epoch etc).
+* Data obtained as an aggregation query on data already in the database (sum of tx inputs,
+  deposit value in a transaction etc).
+
+For the case of adatabase aggregation queries, it is not a problem for things like populating the
+epoch table, but is a significant performance hit when calculating things like the deposit amount
+for every transaction.
+
+It should also be noted that because `db-sync` has to insert data into PostgreSQL, it will likely
+be the first Cardano component to hit issues where its performance cannot keep up with a new
+block arriving every 20 seconds. Anything that can reduce the amount of computation `db-sync`
+needs to do improves its performance.
+
+To sumarize, the problems with the current approach:
+
+* There are four different mechanisms to get all the information needed by the database.
+* `db-sync` needs to maintain a copy of ledger state that is identical to the copy of the
+  ledger state in the `node`. With `db-sync`, that means applying a block to an existing ledger
+  state is done twice; once in the `node` and once in `db-sync`. The amount of code required to
+  maintain ledger state is significant and basically duplicates code in `consensus`.
+* Some data that goes into the database (eg the `deposit` field of the `tx` table) must be
+  calculated using a database query. This needs to be done for every transaction in every
+  block and is not a cheap operation.
+
+
+### What Data is Needed?
+
+The data needed by `db-sync` but not actually part of the blockchain (not necessarily an
+exhaustive list):
+
+* UTC time stamp for each block (currently calculated with a local state query).
+* Epoch number and slot number within an epoch (currently calculated with a local state query).
+* The rewards for each epoch (extracted from the ledger state).
+* The stake distribution for each epoch (extracted from the ledger state).
+* The deposit value for each transaction (requires a database query).
+
+
+### An Ideal API for a `db-sync`-like Consumer
+
+The ideal API for a `db-sync` like consumer would not require the consumer to maintain its
+own ledger state. Instead, the API would provide two things:
+
+* Enhanced or annotated blocks (these are not blocks as they would appear on the blockchain),
+  with addition information like a UTC time stamp, current era, epoch number, slot within
+  an epoch etc. These annotated blocks are era independent.
+* Enhanced/annotated blocks would not contain the blockchain version of blocks, but an enhanced
+  block with things like the deposit value for each transaction. Like the annotated blocks,
+  annotated transactions would also be era independent.
+* Ledger events notifying of things that are difficult to obtain just by looking at the current
+  block. This would include things like:
+
+  ```
+  data LedgerEvent
+    = LedgerNewEra !Era
+    | LedgerNewEpoch !EpochNo
+    | LedgerRewards !EpochNo !Rewards
+    | LedgerStakeDist !StakeDist
+    | LedgerBlock !AnnotatedBlock
+  ```
+  Rewards and stake distribution could be calculated incrementally by the ledger and the partial
+  results could be passed to `db-sync` as they become available.
+
+There would then be something like the `LocalChainSync` protocol that passes `LedgerEvent`
+objects over the connection rather than the current blockchain version of the blocks.