Technical design: Logs and events #728

raulk · 2022-06-13T16:00:52Z

Context

The Ethereum blockchain has the concept of logs, which are events emitted from smart contracts during execution. Logs contain arbitrary data, and are annotated with zero to four 32-byte topics depending on the opcode used (LOG0..LOG4). The fields from logs (topics, data, emitting address) are added to a 2048-bit bloom filter which is then incorporated to the block header.

The bloom filter is important because it is used by:

light clients and wallets to quickly evaluate if a block is of interest depending on what they are looking for.
full nodes to service log-related JSON-RPC queries (eth_getLogs, eth_getFilterLogs, eth_getFilterChanges); either in a streaming or polling fashion. Filter support implies tracking state at the node level.

AFAIK logs in Ethereum are not part of the world state, i.e. they are not stored in the state tree (we need to double check this). They are just emitted during execution and consensus is arrived to through the bloom filter, gas used, and other outputs.

Requirements

The EVM compatiblity in Filecoin will need to support Ethereum logs at the protocol level and the JSON-RPC level.
We should avoid overfitting to Ethereum's needs -- this feature should be available to native actors too and should be generally usable and accessible.

Possible design direction

At this stage, we do not plan on introducing modifications to the chain data structures, so populating an aggregation of logs in block headers is a no-go. That leaves us with three options:

Support at the node level, by offering a syscall that traverses an extern, and records logs somewhere in node land.
- Con: this does not cause a result in commitment of emitted logs (does factor into consensus indirectly through gas fees).
Support at the FVM level, by offering a syscall that buffers logs and appends to a bloom filter. When finalizing the machine, we'd return the bloom filter and logs to the node. The node can store bloom filters and do whatever with the logs themselves (strema them, cache/store them, or rely on re-execution).
- Con: this also doesn't result in a chain commitment. However, we could track the bloom filter's content as a field in the system actor, which would be updated implicitly by the FVM on machine finalization.
Support at the system actor / built-in actor level through a singleton actor. Logs are emitted by calling a LogsActor. The logs actor stores a height => bloom filter mapping and exposes a GetLogsBloom(height) to return it. We'd need to add a cron job to prune LogActor entries and limit them to the current finality. Getting the logs would require re-execution and introspection of call parameters through execution traces.

Light client operation

In Ethereum, light clients monitor block headers containing event bloom filters to determine whether they want to act on a block.
Since Filecoin does not include the logs blooms in a chain structure, Filecoin light clients would operate by receiving the current bloom in the system actor accompanied by a merkle inclusion proof.

The text was updated successfully, but these errors were encountered:

Stebalien · 2022-06-15T17:02:48Z

~~#784~~

edit: hm. Wrong issue.

Stebalien · 2022-06-15T18:21:23Z

On bloom filters, we should revisit that decision from first principles.

2048 bits is rather small. See https://medium.com/@naterush1997/eth-goes-bloom-filling-up-ethereums-bloom-filters-68d4ce237009.
We have IPLD and light clients can fetch partial state over IPLD.

From that, I'd say we should:

Consider storing events (or at least the keys) in a HAMT (reset every epoch). Clients can download only the parts of the HAMT that they need.
If we still need a bloom filter (likely easier for quick light-client checks), we should probably make the size dynamic depending on the number of events. This isn't something we can reasonably do if we put it into the block header itself, but it's something we can do if we put it in the state-tree.

Where to put them...

I'd prefer to hang them off the block (treat them like receipts), we should talk with the core implementers to see how difficult this would be. For example, we could change BlockHeader.ParentMessageReceipts to actually be BlockHeader.ParentArtifacts (or something like that), including receipts, events, and anything else we need to stash in the block header. This should be quite doable (even simple) given few components interact with the receipts.

If not, storing them in an actor isn't the end of the world. However:

I'd just clear the list on every epoch.
I wouldn't make the events available to other actors, that's not really what these are for.

Stebalien · 2022-06-23T02:28:12Z

Resolution from the discussion today:

Add a new field to the block for execution "artifacts", including logs/events.
These artifacts will include a variable-sized bloom filter and an event "index" (HAMT).

Specifically, something like:

type BlockHeader struct {
	Miner address.Address // 0 unique per block/miner
	// ...
	ParentArtifacts cid.Cid
}

type ExecutionArtifacts struct { // name TBD
	// A variable-sized bloom filter to quickly tell what events may exist.
	EventBloomFilter []byte

	// An AMT of all events.
	Events cid.Cid

	// A HAMT indexing events mapping index keys to indices in the Events AMT.
	EventIndex cid.Cid
}

Design rational:

Avoid putting events into the state-tree itself.
- This makes garbage collection easier. By linking to them in the block header, we can, store them in a separate blockstore and easily garbage collect them generationally.
- Nodes that don't care about events can simply discard them (after computing the HAMT root).
Store the actual events so that light-clients, and potentially other chains, can learn about them.
Put them in a separate object (not directly in the header) so we can add more fields easily, dynamically size the bloom filter, etc.

Drawbacks:

Creating the HAMT will require more hashing than simply inserting events into a bloom filter.

Open Questions:

Ethereum also stores the logs in the transaction receipt. We'll probably need to do this as well.
We need to figure out the details of what we actually want to index.
We need to handle the fact that event values are potentially arbitrary sizes (and probably need to set a sane maximum.

Stebalien · 2022-06-23T06:29:41Z

@raulk we should probably discuss the open questions in standup before continuing here.

Stebalien · 2022-06-23T16:46:38Z

Next step: Write up a series of use-cases to better understand the problem.

raulk · 2022-06-24T08:57:40Z

Use cases include:

Serving Ethereum JSON-RPC logs related
Lightweight observation of chain events (evaluating my interests against the bloom filter, or similar)
Light clients (how would light clients would trustlessly query for events, and what would the patterns of access be: per message, per "topic" in a block, per emitting address, matching on data, etc. How would the inclusion proofs look for those)

raulk · 2022-06-24T09:02:53Z

We will need to associate the logs to the concrete messages that emitted them. Ethereum does this by embedding the logs in the receipt (including a bloom filter, which I don't know if it's scoped to the logs in that message, or the cumulative bloom filter up until then; I'd imagine the former). One idea is to have a top level structure vector structure collecting all logs from the tipset, and receipts would contain bitfields addressing the emitted logs via their index into the vector. However, this makes producing inclusion proofs harder (I think), and it makes the message receipts less useful by themselves.

anorth · 2022-06-27T21:23:11Z

@Stebalien what is "index keys" that are the keys of the HAMT?

I agree that logs/events need to be referenced from the message receipts in order to be most useful to light clients, UIs etc. If we put such structure in the message receipts, then do we need the events and index in the block at all? They're committed via the receipts root CID.

Stebalien · 2022-06-28T00:32:32Z

what is "index keys" that are the keys of the HAMT?

TBD. We want to make it possible for a light client to get a succinct (and cheap) proof that some event did or did not happen in any given block.

Likely:

actor id + event key
actor id
actor code + event key
actor code

But I'm a bit concerned that the HAMT could grow large.

I agree that logs/events need to be referenced from the message receipts in order to be most useful to light clients, UIs etc. If we put such structure in the message receipts, then do we need the events and index in the block at all? They're committed via the receipts root CID.

Unfortunately, light clients would have to download all messages and receipts (including top-level return values) for that to work. We'd like light clients to be able to download just:

The chain.
The root "artifacts" block.

Then, if their event is in the bloom filter:

Enough of the HAMT to figure out where the event happened.
The relevant event/message.

Stebalien · 2022-08-25T09:50:15Z

Concrete proposal:

Introduce a new log that takes a set of log topics and a block ID.
Do NOT index anything (yet). Indexing will be handled in a followup FIP.

fn log(count: u32, topics: *const u8, value: BlockId)

Where:

count is the number of topics (1-4 for now).
topics is a byte slice with the length 32*topics. Each topic is an arbitrary 32bit key topics[i*32..(i+1)*32].
value is a block ID of a value.

Define an event object of the type:

struct Event {
    actor: ActorID,
    topics: Vec<u8>,
    value: cid.Cid,
}

When an event is logged:

Make a CID of the value block where:
1. The CID is "inline" if the length of the value is <= 32 bytes.
2. Otherwise, we hash with blake2b.
Record an event object with the caller's ActorID, the specified topics, and the value CID.

When creating a message receipt, pack all events into an AMT in-order and include the AMT root in the receipt.

Decisions

Are we OK with fixed 32byte topics? That comes from the EVM, but it's a useful constraint.
We're storing the value in either an inline CID or an external block because it might be large (and we want the AMT to have predictably sized nodes. Is that OK?
Are we OK storing the topics in a single vector (concatenated)? It simplifies things, but users may not like it?

raulk · 2022-08-25T09:58:38Z

Notes from sync design meeting + concrete proposals

Descoping indices

We are moving the indexes out of the scope of this solution. Right now we want to focus on the simplest, extensible solution that: (a) is not overengineered for what we need now, (b) does not back us into a design corner now without sufficient information, (c) is easily extensible in the future.

Storing raw events

For now, we will be storing the raw events only, allowing clients to experiment and generate indexes client side entirely. The schema of an event is as follows:

(see @Stebalien's comment above)

During execution, the Call Manager adds emitted events to the blockstore and populates an AMT tracking the Cids of those event objects.

Commitment on chain

We extend the Receipt chain data structure with a new field:

pub struct Receipt {
    // existing fields
    exit_code: ExitCode,
    return_data: RawBytes,
    gas_used: i64,
    // new field
    events: Cid,
}

When the message is finalized, we return the Receipt with the events field populated.

Patterns of access

While the protocol does not mandate this, clients may wish to cache events in a local database for efficient access. With the structure above, it's possible to access events for a given message or all events for a tipset by returning events from all receipts.

Ethereum JSON-RPC compatibility

At this stage, we do not track logs blooms and we definitely do not track Ethereum formatted blooms (fixed size keccak256 based hashing). The Ethereum JSON-RPC API will need to recreate the bloom filters on demand (or implementations could choose to do something different if they wish to optimise for faster bloom query).

raulk · 2022-11-04T13:48:30Z

Draft FIP at filecoin-project/FIPs#483.

raulk · 2022-11-09T17:12:01Z

We can consider the technical design phase to have finished, culminating with the FIP draft at filecoin-project/FIPs#483. Closing this issue.

raulk added the Epic: EVM support label Jun 13, 2022

raulk assigned Stebalien, karim-agha and anorth Jun 14, 2022

raulk added the Kind: Design label Jun 14, 2022

karim-agha mentioned this issue Aug 8, 2022

EVM runtime: Implement EVM Logging #682

Closed

raulk mentioned this issue Aug 17, 2022

basic EVM smart contract actor filecoin-project/builtin-actors#517

Merged

raulk added the MIGRATED label Aug 18, 2022

raulk transferred this issue from filecoin-project/fvm-specs Aug 18, 2022

raulk added Topic: Logs and events Topic: EVM runtime and removed Epic: EVM support labels Aug 18, 2022

raulk added this to the M2.1 Topaz (r8) milestone Sep 2, 2022

maciejwitowski mentioned this issue Oct 7, 2022

FEVM Development Checklist #936

Closed

85 tasks

iand mentioned this issue Oct 12, 2022

API: Ethereum compatible actor event methods backed by FVM filecoin-project/lotus#9473

Closed

9 tasks

raulk mentioned this issue Nov 4, 2022

FIP-0049 draft: Actor events filecoin-project/FIPs#483

Merged

maciejwitowski mentioned this issue Nov 7, 2022

M2.1 (r8) Topaz #939

Closed

raulk closed this as completed Nov 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Technical design: Logs and events #728

Technical design: Logs and events #728

raulk commented Jun 13, 2022 •

edited

Loading

Stebalien commented Jun 15, 2022 •

edited

Loading

Stebalien commented Jun 15, 2022

Stebalien commented Jun 23, 2022

Stebalien commented Jun 23, 2022

Stebalien commented Jun 23, 2022

raulk commented Jun 24, 2022 •

edited

Loading

raulk commented Jun 24, 2022

anorth commented Jun 27, 2022

Stebalien commented Jun 28, 2022

Stebalien commented Aug 25, 2022 •

edited

Loading

raulk commented Aug 25, 2022 •

edited

Loading

raulk commented Nov 4, 2022

raulk commented Nov 9, 2022

Technical design: Logs and events #728

Technical design: Logs and events #728

Comments

raulk commented Jun 13, 2022 • edited Loading

Context

Requirements

Possible design direction

Light client operation

Stebalien commented Jun 15, 2022 • edited Loading

Stebalien commented Jun 15, 2022

Stebalien commented Jun 23, 2022

Stebalien commented Jun 23, 2022

Stebalien commented Jun 23, 2022

raulk commented Jun 24, 2022 • edited Loading

raulk commented Jun 24, 2022

anorth commented Jun 27, 2022

Stebalien commented Jun 28, 2022

Stebalien commented Aug 25, 2022 • edited Loading

raulk commented Aug 25, 2022 • edited Loading

Descoping indices

Storing raw events

Commitment on chain

Patterns of access

Ethereum JSON-RPC compatibility

raulk commented Nov 4, 2022

raulk commented Nov 9, 2022

raulk commented Jun 13, 2022 •

edited

Loading

Stebalien commented Jun 15, 2022 •

edited

Loading

raulk commented Jun 24, 2022 •

edited

Loading

Stebalien commented Aug 25, 2022 •

edited

Loading

raulk commented Aug 25, 2022 •

edited

Loading