This document provides a comprehensive guide on how to set up and run a validator using validator.py
. Validators are crucial components of τemplar, responsible for evaluating miners' contributions by assessing their uploaded gradients.
- Validator Setup
This guide will help you set up and run a validator for τemplar. Validators play a critical role in maintaining the integrity of the network by evaluating miners' contributions and updating weights accordingly.
- NVIDIA GPU with CUDA support
- Minimum H100 recommended
- Ubuntu (or Ubuntu-based Linux distribution)
- Docker and Docker Compose
- Git
- Cloudflare R2 Bucket Configuration:
- Bucket Setup:
- Create a Bucket: Name it the same as your account ID and set the region to ENAM.
- Generate Tokens:
- Read Token: Admin Read permissions.
- Write Token: Admin Read & Write permissions.
- Store Credentials: You'll need these for the
.env
file.
- Bucket Setup:
-
Install Docker and Docker Compose:
Follow the same steps as in the Miner Setup section.
-
Enable Docker GPU Support:
Follow the official NVIDIA Container Toolkit installation guide:
# 1. Configure the production repository curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list # 2. Update package listings sudo apt-get update # 3. Install the NVIDIA Container Toolkit sudo apt-get install -y nvidia-container-toolkit # 4. Configure Docker runtime sudo nvidia-ctk runtime configure --runtime=docker # 5. Restart Docker daemon sudo systemctl restart docker # 6. Test GPU support docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
If you see the
nvidia-smi
output, GPU support is working correctly.For detailed instructions and other Linux distributions, refer to the official NVIDIA Container Toolkit installation guide.
-
Clone the Repository:
git clone https://github.com/tplr-ai/templar.git cd templar
-
Navigate to the Docker Directory:
cd docker
-
Create and Populate the
.env
File:Create a
.env
file in thedocker
directory by copying the.env.example
:cp .env.example .env
Populate the
.env
file with your configuration. Variables to set:# Add your Weights & Biases API key WANDB_API_KEY=<your_wandb_api_key> # Cloudflare R2 Credentials - Add your R2 credentials below R2_GRADIENTS_ACCOUNT_ID=<your_r2_account_id> R2_GRADIENTS_BUCKET_NAME=<your_r2_bucket_name> R2_GRADIENTS_READ_ACCESS_KEY_ID=<your_r2_read_access_key_id> R2_GRADIENTS_READ_SECRET_ACCESS_KEY=<your_r2_read_secret_access_key> R2_GRADIENTS_WRITE_ACCESS_KEY_ID=<your_r2_write_access_key_id> R2_GRADIENTS_WRITE_SECRET_ACCESS_KEY=<your_r2_write_secret_access_key> R2_DATASET_ACCOUNT_ID=80f15715bb0b882c9e967c13e677ed7d R2_DATASET_BUCKET_NAME=80f15715bb0b882c9e967c13e677ed7d R2_DATASET_READ_ACCESS_KEY_ID=88548d962edc9a1f4416cbb3453d914a R2_DATASET_READ_SECRET_ACCESS_KEY=4934ae848465113a75babf7d0a88efd9112aa49296c900744268e91f1d31998f # Wallet Configuration WALLET_NAME=<your_wallet_name> WALLET_HOTKEY=<your_wallet_hotkey> # Network Configuration NETWORK=finney NETUID=3 # GPU Configuration CUDA_DEVICE=cuda:0 # Node Type NODE_TYPE=validator # Additional Settings DEBUG=false
Note: Set
NODE_TYPE
tovalidator
. -
Update
docker-compose.yml
:Ensure that the
docker-compose.yml
file is correctly configured for your setup. -
Run Docker Compose:
Start the validator using Docker Compose:
docker compose -f docker/compose.yml up -d
If you prefer to run the validator without Docker, follow the instructions in the Running Without Docker section.
After completing the installation steps, your validator should be running. Check it with:
docker ps
You should see a container named templar-validator-<WALLET_HOTKEY>
.
-
Install System Dependencies:
Same as in the miner setup.
-
Install NVIDIA CUDA Drivers:
Install the appropriate NVIDIA CUDA drivers.
-
Clone the Repository:
git clone https://github.com/tplr-ai/templar.git cd templar
-
Set Up Python Environment:
export WANDB_API_KEY=your_wandb_api_key export NODE_TYPE=your_node_type export WALLET_NAME=your_wallet_name export WALLET_HOTKEY=your_wallet_hotkey export CUDA_DEVICE=your_cuda_device export NETWORK=your_network export NETUID=your_netuid export DEBUG=your_debug_setting # Gradients R2 credentials export R2_GRADIENTS_ACCOUNT_ID=your_r2_account_id export R2_GRADIENTS_BUCKET_NAME=your_r2_bucket_name export R2_GRADIENTS_READ_ACCESS_KEY_ID=your_r2_read_access_key_id export R2_GRADIENTS_READ_SECRET_ACCESS_KEY=your_r2_read_secret_access_key export R2_GRADIENTS_WRITE_ACCESS_KEY_ID=your_r2_write_access_key_id export R2_GRADIENTS_WRITE_SECRET_ACCESS_KEY=your_r2_write_secret_access_key # Dataset R2 credentials export R2_DATASET_ACCOUNT_ID=80f15715bb0b882c9e967c13e677ed7d export R2_DATASET_BUCKET_NAME=80f15715bb0b882c9e967c13e677ed7d export R2_DATASET_READ_ACCESS_KEY_ID=88548d962edc9a1f4416cbb3453d914a export R2_DATASET_READ_SECRET_ACCESS_KEY=4934ae848465113a75babf7d0a88efd9112aa49296c900744268e91f1d31998f export GITHUB_USER=your_github_username
-
Create and Register Validator Wallet:
# Create coldkey if not already created btcli wallet new_coldkey --wallet.name default --n-words 12 # Create and register validator hotkey btcli wallet new_hotkey --wallet.name default --wallet.hotkey validator --n-words 12 btcli subnet pow_register --wallet.name default --wallet.hotkey validator --netuid <netuid> --subtensor.network <network>
-
Log into Weights & Biases (WandB):
wandb login your_wandb_api_key
-
Set Environment Variables:
Export necessary environment variables as in the miner setup.
-
Run the Validator:
python neurons/validator.py \ --actual_batch_size 6 \ --wallet.name default \ --wallet.hotkey validator \ --device cuda \ --use_wandb \ --netuid <netuid> \ --subtensor.network <network> \ --sync_state
Set the following in the docker/.env
file when using Docker Compose:
WANDB_API_KEY=your_wandb_api_key
# Cloudflare R2 Credentials
R2_ACCOUNT_ID=your_r2_account_id
R2_READ_ACCESS_KEY_ID=your_r2_read_access_key_id
R2_READ_SECRET_ACCESS_KEY=your_r2_read_secret_access_key
R2_WRITE_ACCESS_KEY_ID=your_r2_write_access_key_id
R2_WRITE_SECRET_ACCESS_KEY=your_r2_write_secret_access_key
# Wallet Configuration
WALLET_NAME=default
WALLET_HOTKEY=your_validator_hotkey_name
# Network Configuration
NETWORK=finney
NETUID=3
# GPU Configuration
CUDA_DEVICE=cuda:0
# Node Type
NODE_TYPE=validator
# Additional Settings
DEBUG=false
Note: The R2 permissions remain unchanged.
- GPU Requirements:
- Minimum: NVIDIA H100 with 80GB VRAM
- Storage: 200GB+ recommended for model and evaluation data
- RAM: 32GB+ recommended
- Network: High-bandwidth, stable connection for state synchronization
- Mainnet (Finney):
- Network:
finney
- Netuid:
3
- Network:
- Testnet:
- Network:
test
- Netuid:
223
- Network:
- Local:
- Network:
local
- Netuid:
1
- Network:
-
Docker Logs:
docker logs -f templar-validator-${WALLET_HOTKEY}
-
Weights & Biases:
- Ensure
--use_wandb
is enabled - Monitor evaluation metrics and network statistics
- Ensure
Key metrics to monitor:
- GPU utilization
- Memory usage
- Network bandwidth
- Evaluation throughput
- Weight setting frequency
- State Synchronization Failures: Check network settings and ensure the validator is properly registered and connected.
- Out of Memory Errors: Reduce
--actual_batch_size
. - Network Connectivity Issues: Verify firewall settings and network configurations.
- The validator synchronizes its model with the latest global state.
- It gathers and applies gradients from miners to maintain consistency.
- Collect Miner Gradients: Gathers compressed gradients submitted by miners.
- Evaluate Contributions: Assesses the impact of each miner's gradient on model performance.
- Compute Scores: Calculates scores based on loss improvement.
- Update Weights: Adjusts miners' weights on the blockchain accordingly.
- Scoring Mechanism: Based on the performance improvement contributed by miners.
- Update Frequency: Weights are periodically updated on the blockchain.
- Impact: Influences reward distribution and miner reputation in the network.