Skip to content

Commit

Permalink
added section on snapshotter system design
Browse files Browse the repository at this point in the history
  • Loading branch information
anomit committed Nov 30, 2023
1 parent 82d12b8 commit 8aace8f
Show file tree
Hide file tree
Showing 3 changed files with 99 additions and 0 deletions.
99 changes: 99 additions & 0 deletions docs/Protocol/Specifications/Snapshotter/components.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,102 @@ sidebar_position: 0

# System Design

![Snapshotter Components](/images/MajorComponents.png)

## System Event Detector

The system event detector tracks events being triggered on the protocol state contract running on the anchor chain and forwards it to a callback queue with the appropriate routing key depending on the event signature and type among other information.


## Process Hub Core

The Process Hub Core, defined in [`process_hub_core.py`](https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/process_hub_core.py), serves as the primary process manager in the snapshotter.
* Operated by the CLI tool [`processhub_cmd.py`](https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/processhub_cmd.py), it is responsible for starting and managing the `SystemEventDetector` and `ProcessorDistributor` processes.
* Additionally, it spawns the base snapshot and aggregator workers required for processing tasks from the `powerloom-backend-callback` queue. The number of workers and their configuration path can be adjusted in [`config/settings.json`](https://github.com/PowerLoom/snapshotter-configs/blob/fcf9b852bac9694258d7afcd8beeaa4cf961c65f/settings.example.json).

## Processor Distributor
The Processor Distributor, defined in [`processor_distributor.py`](https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/processor_distributor.py), is initiated using the `processhub_cmd.py` CLI.

* It loads the preloader, base snapshotting, and aggregator config information from the settings file
* It reads the events forwarded by the event detector to the `f'powerloom-event-detector:{settings.namespace}:{settings.instance_id}'` RabbitMQ queue bound to a topic exchange as configured in `settings.rabbitmq.setup.event_detector.exchange`([code-ref: RabbitMQ exchanges and queue setup in pooler](https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/init_rabbitmq.py))
* It creates and distributes processing messages based on the preloader configuration present in `config/preloader.json`, the project configuration present in [`config/projects.json`](https://github.com/PowerLoom/snapshotter-configs/blob/fcf9b852bac9694258d7afcd8beeaa4cf961c65f/projects.example.json) and [`config/aggregator.json`](https://github.com/PowerLoom/snapshotter-configs/blob/fcf9b852bac9694258d7afcd8beeaa4cf961c65f/aggregator.example.json)
* For [`EpochReleased` events](#epoch-generation), it forwards such messages to base snapshot builders for data source contracts as configured in `config/projects.json` for the current epoch information contained in the event.

```python reference
https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/processor_distributor.py#L1077-L1115
```


## Delegation Workers for preloaders

![Delegation worker dependent preloading architecture](/images/delegate_preloading.png)

The preloaders often fetch and cache large volumes of data, for eg, all the transaction receipts for a block on the data source blockchain. In such a case, a single worker will never be enough to feasibly fetch the data for a timely base snapshot generation and subsequent aggregate snapshot generations to finally reach a consensus.

Hence such workers are defined as `delegate_tasks` in [`config/preloader.json`](https://github.com/PowerLoom/snapshotter-configs/blob/fcf9b852bac9694258d7afcd8beeaa4cf961c65f/preloader.json) and the [process hub core](#process-hub-core) launches a certain number of workers as defined in the primary settings file, [`config/settings.json`](https://github.com/PowerLoom/snapshotter-configs/blob/fcf9b852bac9694258d7afcd8beeaa4cf961c65f/settings.example.json) under the key `callback_worker_config.num_delegate_workers`.

```python reference
https://github.com/PowerLoom/snapshotter-configs/blob/fcf9b852bac9694258d7afcd8beeaa4cf961c65f/preloader.json#L19-L25
```

```python reference
https://github.com/PowerLoom/snapshotter-configs/blob/fcf9b852bac9694258d7afcd8beeaa4cf961c65f/settings.example.json#L86-L90
```

Delegation workers operate over a simple request-response queue architecture over RabbitMQ.

```python reference
https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/init_rabbitmq.py#L243-L254
```

One of the preloaders bundled with this snapshotter peer is tasked with fetching all the transaction receipts within a given epoch's block range and because of the volume of data to be fetched it delegates this work to a bunch of delegation worker

* The Preloader: [snapshotter/utils/preloaders/tx_receipts/preloader.py](https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/utils/preloaders/tx_receipts/preloader.py).
* The Delegation Workers: [snapshotter/utils/preloaders/tx_receipts/delegated_worker/tx_receipts.py](https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/utils/preloaders/tx_receipts/delegated_worker/tx_receipts.py)

As a common functionality shared by all preloaders that utilize delegate workers, this logic is present in the generic class `DelegatorPreloaderAsyncWorker` that all such preloaders inherit. Here you can observe the workload is sent to the delegation workers

```python reference
https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/utils/generic_delegator_preloader.py#L191-L210
```

Upon sending out the workloads tagged by unique request IDs, the delegator sets up a temporary exclusive queue to which only the delegation workers meant for the task type push their responses.

```python reference
https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/utils/generic_delegator_preloader.py#L159-L186
```

The corresponding response being pushed by the delegation workers can be found here in the generic class `DelegateAsyncWorker` that all such workers should inherit from:

```python reference
https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/utils/delegate_worker.py#L74-L84
```

## Callback Workers

The callback workers are the ones that build the base snapshot and aggregation snapshots and as explained above, are launched by the [process hub core](#process-hub-core) according to the configurations in `aggregator/projects.json` and `config/aggregator.json`.

They listen to new messages on the RabbitMQ topic exchange as described in the following configuration, and the topic queue's initialization is as follows.

```python reference
https://github.com/PowerLoom/snapshotter-configs/blob/fcf9b852bac9694258d7afcd8beeaa4cf961c65f/settings.example.json#L33-L55
```

```python reference
https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/init_rabbitmq.py#L182-L213
```

Upon receiving a message from the processor distributor after preloading is complete, the workers do most of the heavy lifting along with some sanity checks and then call the `compute()` callback function on the project's configured snapshot worker class to transform the dependent data points as cached by the preloaders to finally generate the base snapshots.

:::info
[Snapshot generation specification](/docs/Protocol/Specifications/Snapshotter/snapshot_build.md)
:::

## RPC Helper

Extracting data from the blockchain state and generating the snapshot can be a complex task. The `RpcHelper`, defined in [`utils/rpc.py`](https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/utils/rpc.py), has a bunch of helper functions to make this process easier. It handles all the `retry` and `caching` logic so that developers can focus on efficiently building their use cases.


## Core API

This component is one of the most important and allows you to access the finalized protocol state on the smart contract running on the anchor chain. Find it in [`core_api.py`](https://github.com/PowerLoom/pooler/blob/634610801a7fcbd8d863f2e72a04aa8204d27d03/snapshotter/core_api.py).
Binary file added static/images/MajorComponents.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/delegate_preloading.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 8aace8f

Please sign in to comment.