Better Lotus indices #10326

raulk · 2023-02-22T13:34:05Z

Context

Lotus’ chain and state storage revolves around a monolithic blockstore that contains raw chain and state data, referred to by CID. There exists a skiplist of size 10 for chain lookbacks (which was known to be flaky in the past, probably fixed now? would help to have utilisation stats to confirm).

However, there are no indices to facilitate common chain queries. For example, loading a tipset by height, finding the tipset/blocks where a message was included, finding messages involving specific actors within a particular time range, etc. are costly operations. Most of these queries involve walking chain history and/or performing a number of sequential queries through traversals.

Goal

Lotus optimizes common data access patterns to reduce latency, disk I/O and computational cost, for a better developer and node operator experience.

Change motivation

With FEVM about to be launch, we expect the programmatic usage of Lotus' API is going to increase. Wallets, dapps, libraries, and utilities will all start consuming Lotus' API through both the Eth JSON-RPC API, and its Filecoin-specific API.

Asks

Introduce indices and support data structures to enable these access patterns.

High-prio

Randomly seek to a chain height -- make O(1) instead of O(n).
Identify which tipset and blocks a message (identified by CID) landed in. -- make O(1) instead of O(n^m).
Load the receipt for a given message (identified by CID). -- make O(1) instead of O(n^m+1).

Medium-prio

Ability to get gas used of a tipset instantaneously.
Ability to cache the canonical order of messages in a tipset.
Ability to quickly know the message count in a tipset.

Considerations

Need to deal with reorgs.
Need to deal with splistore (whatever archiving policy is chosen may need to extend to indices).
Consider garbage collection for indices.
Figure out the right sync/async mix to ensure consistency yet not block chain sync pathways.
Make sure indices are consistent across load balanced Lotus (maybe nothing to do here).

arajasek · 2023-03-08T18:35:51Z

We (@raulk @jennijuju @vyzo @Stebalien @magik6k @Kubuxu ) had an extended discussion on this today. We emerged with the following chief themes:

Solving the reorg problem is going to be hard, any indices should be untouchable by reorgs (past finality only)
The primary index needed is a mapping from message CID to tipset in which it was included. Everything else is either quickly computable once we have that, or already fast enough today.
The index is only as useful as the amount of chain history (including state) that one has.

Given that, the concrete proposal is to:

Add an index for message CIDs to tipset keys for all tipsets one "syncs" (either imports in a snapshot, or syncs from peers) past finality
- @vyzo is gonna work on this, and will supply more details as needed
Perform refactors as needed so that all lookups (Receipts, tipset inclusion, gas costs, etc.) first perform a lookup in the new index
Continue to rely on Search backs if a match is not found in the index
- This will apply to all messages within the latest finality window
- This may not be fast enough to meet our needs, but can be separately optimized.

raulk · 2023-03-08T19:45:05Z

The primary index needed is a mapping from message CID to tipset in which it was included. Everything else is either quickly computable once we have that, or already fast enough today.
Add an index for message CIDs to tipset keys for all tipsets one "syncs" (either imports in a snapshot, or syncs from peers) past finality

Is this what we landed on? My recollection is that we agreed on having a message CID => receipt CID + exec height + exec tipset CID mapping, since that would remove the following work from eth_getTransactionReceipt:

Load the inclusion tipset.
Load the message AMT.
Find the message in the AMT.
Load the execution tipset (load by height!)
Load the receipts AMT.
Find the receipt CID by index.
FInally load the receipt by CID.

EDIT: after writing this I realise that message receipts are not addressable by CID, so instead of the receipt CID we'd need to store the message exec index, so we can omit 1-4.

arajasek · 2023-03-09T18:08:45Z

I thought that's what we landed on, and that's definitely what I would encourage us to do. We can add more indices later, but I want to make this work with the minimal indexing. I suspect that with the msgCid -> inclusionTipset mapping, this will be sufficiently fast (with some perf improvements to the EthAPI itself), because it's in essence:

Looking up the inclusion tipset
one StateReplay with the inclusion tipset specified, which is pretty fast (see below)
the remaining formulation of the ethTxReciept that we would have to do anyway (looking up ethAddresses and such)

If this isn't workable, we can discuss adding more indices, but I strongly think we should build up as needed.

aayush@aayush-desktop ~/p/lotus (master)> time curl -X POST \
                                                  -H "Content-Type: application/json" \
                                                  --data '{ "jsonrpc": "2.0", "method":"Filecoin.StateReplay", "params": [[
                                              {"/":"bafy2bzaced4a7piq2ibym2epryxklasqfla4ri7h5oe2hzl3gz2ygsneaeslo"}],
                                               {"/": "bafy2bzaceb6e3tbiikuofrmoygryebboi42f2wxakcj32b3xd637iieeiygdk"}], "id": 0 }' 'http://127.0.0.1:1234/rpc/v1'


________________________________________________________
Executed in  191.32 millis    fish           external
   usr time    0.46 millis  459.00 micros    0.00 millis
   sys time    3.84 millis  158.00 micros    3.68 millis

raulk · 2023-03-11T15:09:26Z

@arajasek I want to reconcile this discussion with the latest drive to avoid state computation on Eth paths, and reuse saved receipts instead. To make that approach work, we will need the originally proposed index (message CID => receipt CID + exec height + exec tipset CID) to avoid having to calculate the execution index of the message (which would mean loading all blocks, yada yada yada...)

jennijuju assigned vyzo Mar 8, 2023

vyzo mentioned this issue Mar 11, 2023

feat:chain: Message Index #10452

Merged

raulk added this to the Eth RPC perf + fixes milestone Mar 12, 2023

raulk mentioned this issue Mar 12, 2023

EthAPI: Optimize EthGetTransactionReceipt #10418

Closed

jennijuju added the P1 P1: Must be resolved label Mar 13, 2023

vyzo mentioned this issue Mar 22, 2023

Epic: Message Index #10536

Closed

8 tasks

jennijuju modified the milestones: Eth RPC perf + fixes, LM-v1.23.0 Mar 22, 2023

rjan90 modified the milestones: LM-v1.23.0, Lotus - v1.23.0 Mar 27, 2023

fridrik01 mentioned this issue Mar 29, 2023

fix: searchForIndexedMsg always returns an error #10586

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better Lotus indices #10326

Better Lotus indices #10326

raulk commented Feb 22, 2023

arajasek commented Mar 8, 2023

raulk commented Mar 8, 2023 •

edited

Loading

arajasek commented Mar 9, 2023

raulk commented Mar 11, 2023

Better Lotus indices #10326

Better Lotus indices #10326

Comments

raulk commented Feb 22, 2023

Context

Goal

Change motivation

Asks

Considerations

arajasek commented Mar 8, 2023

raulk commented Mar 8, 2023 • edited Loading

arajasek commented Mar 9, 2023

raulk commented Mar 11, 2023

raulk commented Mar 8, 2023 •

edited

Loading