Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp/services/ledgerexporter: Guide to installing and running ledger exporter #5355

Merged
merged 2 commits into from
Jun 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 98 additions & 69 deletions exp/services/ledgerexporter/README.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,130 @@
# Ledger Exporter (Work in Progress)
## Ledger Exporter: Installation and Usage Guide

The Ledger Exporter is a tool designed to export ledger data from a Stellar network and upload it to a specified destination. It supports both bounded and unbounded modes, allowing users to export a specific range of ledgers or continuously export new ledgers as they arrive on the network.
This guide provides step-by-step instructions on installing and using the Ledger Exporter, a tool that exports Stellar network ledger data to a Google Cloud Storage (GCS) bucket for efficient analysis and storage.

Ledger Exporter currently uses captive-core as the ledger backend and GCS as the destination data store.
* [Prerequisites](#prerequisites)
* [Setup](#setup)
* [Set Up GCP Credentials](#set-up-gcp-credentials)
* [Create a GCS Bucket for Storage](#create-a-gcs-bucket-for-storage)
* [Running the Ledger Exporter](#running-the-ledger-exporter)
* [Pull the Docker Image](#1-pull-the-docker-image)
* [Configure the Exporter](#2-configure-the-exporter-configtoml)
* [Run the Exporter](#3-run-the-exporter)
* [Command Line Interface (CLI)](#command-line-interface-cli)
1. [scan-and-fill: Fill Data Gaps](#1-scan-and-fill-fill-data-gaps)
2. [append: Continuously Export New Data](#2-append-continuously-export-new-data)

# Exported Data Format
The tool allows for the export of multiple ledgers in a single exported file. The exported data is in XDR format and is compressed using zstd before being uploaded.
## Prerequisites

```go
type LedgerCloseMetaBatch struct {
StartSequence uint32
EndSequence uint32
LedgerCloseMetas []LedgerCloseMeta
}
```
* **Google Cloud Platform (GCP) Account:** You will need a GCP account to create a GCS bucket for storing the exported data.
* **Docker:** Allows you to run the Ledger Exporter in a self-contained environment. The official Docker installation guide: [https://docs.docker.com/engine/install/](https://docs.docker.com/engine/install/)

## Setup

### Set Up GCP Credentials

Create application default credentials for your Google Cloud Platform (GCP) project by following these steps:
1. Download the [SDK](https://cloud.google.com/sdk/docs/install).
2. Install and initialize the [gcloud CLI](https://cloud.google.com/sdk/docs/initializing).
3. Create [application authentication credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc#google-idp) and store it in a secure location on your system, such as $HOME/.config/gcloud/application_default_credentials.json.

For detailed instructions, refer to the [Providing Credentials for Application Default Credentials (ADC) guide.](https://cloud.google.com/docs/authentication/provide-credentials-adc)

### Create a GCS Bucket for Storage

## Getting Started
1. Go to the GCP Console's Storage section ([https://console.cloud.google.com/storage](https://console.cloud.google.com/storage)) and create a new bucket.
2. Choose a descriptive name for the bucket, such as `stellar-ledger-data`. Refer to [Google Cloud Storage Bucket Naming Guideline](https://cloud.google.com/storage/docs/buckets#naming) for more information.
3. **Note down the bucket name** as you'll need it later in the configuration process.

### Installation (coming soon)

### Command Line Options
## Running the Ledger Exporter

### 1. Pull the Docker Image

Open a terminal window and download the Stellar Ledger Exporter Docker image using the following command:

#### Scan and Fill Mode:
Exports a specific range of ledgers, defined by --start and --end. Will only export to remote datastore if data is absent.
```bash
ledgerexporter scan-and-fill --start <start_ledger> --end <end_ledger> --config-file <config_file_path>
docker pull stellar/ledger-exporter
```

#### Append Mode:
Exports ledgers initially searching from --start, looking for the next absent ledger sequence number proceeding --start on the data store. If abscence is detected, the export range is narrowed to `--start <absent_ledger_sequence>`.
This feature requires ledgers to be present on the remote data store for some (possibly empty) prefix of the requested range and then absent for the (possibly empty) remainder.
### 2. Configure the Exporter (config.toml)
The Ledger Exporter relies on a configuration file (config.toml) to connect to your specific environment. This file defines details like:
- Your Google Cloud Storage (GCS) bucket where exported ledger data will be stored.
- Stellar network settings, such as the network you're using (testnet or pubnet).
- Datastore schema to control data organization.

In this mode, the --end ledger can be provided to stop the process once export has reached that ledger, or if absent or 0 it will result in continous exporting of new ledgers emitted from the network.
A sample configuration file [config.example.toml](config.example.toml) is provided. Copy and rename it to config.toml for customization. Edit the copied file (config.toml) to replace placeholders with your specific details.

urvisavla marked this conversation as resolved.
Show resolved Hide resolved
### 3. Run the Exporter

The following command demonstrates how to run the Ledger Exporter:

It’s guaranteed that ledgers exported during `append` mode from `start` and up to the last logged ledger file `Uploaded {ledger file name}` were contiguous, meaning all ledgers within that range were exported to the data lake with no gaps or missing ledgers in between.
```bash
ledgerexporter append --start <start_ledger> --config-file <config_file_path>
docker run --platform linux/amd64 \
-v "$HOME/.config/gcloud/application_default_credentials.json":/.config/gcp/credentials.json:ro \
-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json \
-v ${PWD}/config.toml:/config.toml \
stellar/ledger-exporter <command> [options]
```

### Configuration (toml):
The `stellar_core_config` supports two ways for configuring captive core:
- use prebuilt captive core config toml, archive urls, and passphrase based on `stellar_core_config.network = testnet|pubnet`.
- manually set the the captive core confg by supplying these core parameters which will override any defaults when `stellar_core_config.network` is present also:
`stellar_core_config.captive_core_toml_path`
`stellar_core_config.history_archive_urls`
`stellar_core_config.network_passphrase`
**Explanation:**

Ensure you have stellar-core installed and set `stellar_core_config.stellar_core_binary_path` to it's path on o/s.
* `--platform linux/amd64`: Specifies the platform architecture (adjust if needed for your system).
* `-v`: Mounts volumes to map your local GCP credentials and config.toml file to the container:
* `$HOME/.config/gcloud/application_default_credentials.json`: Your local GCP credentials file.
* `${PWD}/config.toml`: Your local configuration file.
* `-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json`: Sets the environment variable for credentials within the container.
* `stellar/ledger-exporter`: The Docker image name.
* `<command>`: The Stellar Ledger Exporter command: [append](#1-append-continuously-export-new-data), [scan-and-fill](#2-scan-and-fill-fill-data-gaps))

Enable web service that will be bound to localhost post and publishes metrics by including `admin_port = {port}`
## Command Line Interface (CLI)

An example config, demonstrating preconfigured captive core settings and gcs data store config.
```toml
admin_port = 6061
The Ledger Exporter offers two mode of operation for exporting ledger data:

[datastore_config]
type = "GCS"
### 1. append: Continuously Export New Data

[datastore_config.params]
destination_bucket_path = "your-bucket-name/<optional_subpath1>/<optional_subpath2>/"

[datastore_config.schema]
ledgers_per_file = 64
files_per_partition = 10
Exports ledgers initially searching from --start, looking for the next absent ledger sequence number proceeding --start on the data store. If abscence is detected, the export range is narrowed to `--start <absent_ledger_sequence>`.
This feature requires ledgers to be present on the remote data store for some (possibly empty) prefix of the requested range and then absent for the (possibly empty) remainder.

[stellar_core_config]
network = "testnet"
stellar_core_binary_path = "/my/path/to/stellar-core"
captive_core_toml_path = "my-captive-core.cfg"
history_archive_urls = ["http://testarchiveurl1", "http://testarchiveurl2"]
network_passphrase = "test"
```
In this mode, the --end ledger can be provided to stop the process once export has reached that ledger, or if absent or 0 it will result in continous exporting of new ledgers emitted from the network.

### Exported Files
It’s guaranteed that ledgers exported during `append` mode from `start` and up to the last logged ledger file `Uploaded {ledger file name}` were contiguous, meaning all ledgers within that range were exported to the data lake with no gaps or missing ledgers in between.

#### File Organization:
- Ledgers are grouped into files, with the number of ledgers per file set by `ledgers_per_file`.
- Files are further organized into partitions, with the number of files per partition set by `files_per_partition`.

### Filename Structure:
- Filenames indicate the ledger range they contain, e.g., `0-63.xdr.zstd` holds ledgers 0 to 63.
- Partition directories group files, e.g., `/0-639/` holds files for ledgers 0 to 639.
**Usage:**

#### Example:
with `ledgers_per_file = 64` and `files_per_partition = 10`:
- Partition names: `/0-639`, `/640-1279`, ...
- Filenames: `/0-639/0-63.xdr.zstd`, `/0-639/64-127.xdr.zstd`, ...
```bash
docker run --platform linux/amd64 -d \
-v "$HOME/.config/gcloud/application_default_credentials.json":/.config/gcp/credentials.json:ro \
-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json \
-v ${PWD}/config.toml:/config.toml \
stellar/ledger-exporter \
append --start <start_ledger> [--end <end_ledger>] [--config-file <config_file>]
```

Arguments:
- `--start <start_ledger>` (required): The starting ledger sequence number for the export process.
- `--end <end_ledger>` (optional): The ending ledger sequence number. If omitted or set to 0, the exporter will continuously export new ledgers as they appear on the network.
- `--config-file <config_file_path>` (optional): The path to your configuration file, containing details like GCS bucket information. If not provided, the exporter will look for config.toml in the directory where you run the command.

### 2. scan-and-fill: Fill Data Gaps

#### Special Cases:
Scans the datastore (GCS bucket) for the specified ledger range and exports any missing ledgers to the datastore. This mode avoids unnecessary exports if the data is already present. The range is specified using the --start and --end options.

- If `ledgers_per_file` is set to 1, filenames will only contain the ledger number.
- If `files_per_partition` is set to 1, filenames will not contain the partition.
**Usage:**

#### Note:
- Avoid changing `ledgers_per_file` and `files_per_partition` after configuration for consistency.
```bash
docker run --platform linux/amd64 -d \
-v "$HOME/.config/gcloud/application_default_credentials.json":/.config/gcp/credentials.json:ro \
-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json \
-v ${PWD}/config.toml:/config.toml \
stellar/ledger-exporter \
scan-and-fill --start <start_ledger> --end <end_ledger> [--config-file <config_file>]
```

#### Retrieving Data:
- To locate a specific ledger sequence, calculate the partition name and ledger file name using `files_per_partition` and `ledgers_per_file`.
- The `GetObjectKeyFromSequenceNumber` function automates this calculation.
Arguments:
- `--start <start_ledger>` (required): The starting ledger sequence number in the range to export.
- `--end <end_ledger>` (required): The ending ledger sequence number in the range.
- `--config-file <config_file_path>` (optional): The path to your configuration file, containing details like GCS bucket information. If not provided, the exporter will look for config.toml in the directory where you run the command.

42 changes: 42 additions & 0 deletions exp/services/ledgerexporter/config.example.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@

# Sample TOML Configuration

# Admin port configuration
# Specifies the port number for hosting the web service locally to publish metrics.
admin_port = 6061

# Datastore Configuration
[datastore_config]
# Specifies the type of datastore. Currently, only Google Cloud Storage (GCS) is supported.
type = "GCS"

[datastore_config.params]
# The Google Cloud Storage bucket path for storing data, with optional subpaths for organization.
destination_bucket_path = "your-bucket-name/<optional_subpath1>/<optional_subpath2>/"

[datastore_config.schema]
# Configuration for data organization
ledgers_per_file = 64 # Number of ledgers stored in each file.
files_per_partition = 10 # Number of files per partition/directory.
urvisavla marked this conversation as resolved.
Show resolved Hide resolved

# Stellar-core Configuration
[stellar_core_config]
# Use default captive-core config based on network
# Options are "testnet" for the test network or "pubnet" for the public network.
network = "testnet"

# Alternatively, you can manually configure captive-core parameters (overrides defaults if 'network' is set).

# Path to the captive-core configuration file.
#captive_core_config_path = "my-captive-core.cfg"

# URLs for Stellar history archives, with multiple URLs allowed.
#history_archive_urls = ["http://testarchiveurl1", "http://testarchiveurl2"]

# Network passphrase for the Stellar network.
#network_passphrase = "Test SDF Network ; September 2015"

# Path to stellar-core binary
# Not required when running in a Docker container as it has the stellar-core installed and path is set.
# When running outside of Docker, it will look for stellar-core in the OS path if it exists.
#stellar_core_binary_path = "/my/path/to/stellar-core
14 changes: 0 additions & 14 deletions exp/services/ledgerexporter/config.toml

This file was deleted.

Loading