Compute Operator

Complete guide for running a Compute node on NCN Network.

Overview

Compute nodes execute AI models for inference requests. They:

Execute models in secure sandboxes
Sign computation results
Receive payment for completed tasks

Earnings: ~80% of each inference fee

Requirements

Hardware

Resource

Minimum

Recommended

CPU

4 cores

8+ cores

RAM

8 GB

32 GB

GPU

Optional

NVIDIA RTX 3080+

Storage

100 GB SSD

500 GB NVMe

Network

100 Mbps

1 Gbps

Software

Linux (Ubuntu 22.04 required for sandbox)
Rust 1.70+
Python 3.8+
PyTorch / Transformers

GPU Setup (Optional but Recommended)

# Install NVIDIA drivers
sudo apt install nvidia-driver-535

# Install CUDA toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo apt install cuda

# Verify
nvidia-smi

Setup

1. Install Dependencies

# Update system
sudo apt update && sudo apt upgrade -y

# Install build tools
sudo apt install -y \
  build-essential \
  pkg-config \
  libssl-dev \
  git \
  python3 \
  python3-pip \
  python3-venv

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

2. Build Compute Node

# Clone repository
git clone https://github.com/neurochainai/ncn-network-v2-rs.git
cd ncn-network-v2-rs

# Build release
cargo build --release -p compute_node

# Copy binary
sudo cp target/release/compute_node /usr/local/bin/

3. Set Up Python Environment

# Create virtual environment
sudo mkdir -p /opt/ncn
python3 -m venv /opt/ncn/venv

# Install dependencies
/opt/ncn/venv/bin/pip install --upgrade pip
/opt/ncn/venv/bin/pip install torch transformers numpy scipy

4. Generate Wallet

# Generate new wallet (or use existing)
cast wallet new

# Output:
# Address: 0x...
# Private Key: 0x...

5. Configure Compute Node

Create /etc/ncn/compute.env:

# Gateway Connection
GATEWAY_ADDR=http://gateway.ncn-network.io:50051

# Wallet
COMPUTE_NODE_PRIVATE_KEY=0x...
NODE_WALLET_ADDRESS=0x...

# Model Configuration
MODEL_PATH=/opt/ncn/models

# Python Configuration
PYTHON_PATH=/opt/ncn/venv/bin/python3
PYTHON_VENV_PATH=/opt/ncn/venv

# Sandbox Settings
SANDBOX_MODE=strict
EXECUTION_TIMEOUT_SECS=300
MAX_MEMORY_MB=8192

# Logging
RUST_LOG=info

6. Create Systemd Service

Create /etc/systemd/system/ncn-compute.service:

[Unit]
Description=NCN Compute Node
After=network.target

[Service]
Type=simple
User=root  # Required for sandbox
EnvironmentFile=/etc/ncn/compute.env
ExecStart=/usr/local/bin/compute_node --gateway-addr ${GATEWAY_ADDR} --model-path ${MODEL_PATH}
Restart=always
RestartSec=10

# Sandbox requires elevated privileges
AmbientCapabilities=CAP_SYS_ADMIN CAP_NET_ADMIN

[Install]
WantedBy=multi-user.target

7. Start Compute Node

# Set permissions
sudo chmod 600 /etc/ncn/compute.env

# Enable and start
sudo systemctl daemon-reload
sudo systemctl enable ncn-compute
sudo systemctl start ncn-compute

# Check status
sudo systemctl status ncn-compute

Download Models

Manual Download

# Create models directory
sudo mkdir -p /opt/ncn/models

# Download from Hugging Face
/opt/ncn/venv/bin/python3 << 'EOF'
from transformers import AutoModel, AutoTokenizer
import torch

# Download Bark models (example)
model = AutoModel.from_pretrained("suno/bark")
model.save_pretrained("/opt/ncn/models/bark")
EOF

Using sync-models (Automatic)

# Enable model sync in config
SYNC_MODELS=true

# Start compute node with sync
cargo run --bin compute_node -- \
  --gateway-addr http://gateway:50051 \
  --sync-models

Model Directory Structure

/opt/ncn/models/
├── bark_semantic/
│   ├── config.json
│   ├── model.pt
│   └── tokenizer.json
├── bark_coarse/
│   └── ...
└── bark_fine/
    └── ...

Operations

Monitor Compute Node

# View logs
sudo journalctl -u ncn-compute -f

# Check if registered with gateway
grep "registered" /var/log/ncn/compute.log

View Task Execution

# Watch for tasks
sudo journalctl -u ncn-compute -f | grep -E "(task|execution)"

Check Resource Usage

# CPU/Memory
htop

# GPU usage
nvidia-smi -l 1

# Disk usage
df -h /opt/ncn/models

Sandbox Configuration

Sandbox Modes

Mode

Description

Use Case

strict

Full isolation

Production

permissive

Relaxed rules

Testing

disabled

No sandbox

Development only

Strict Mode Features

seccomp: Syscall filtering
Namespaces: PID, network, mount isolation
Landlock: Filesystem access control
Resource limits: CPU, memory, time

Troubleshooting Sandbox

# Run with permissive mode for debugging
SANDBOX_MODE=permissive cargo run --bin compute_node

# Check kernel features
cat /boot/config-$(uname -r) | grep -E "(SECCOMP|LANDLOCK|NAMESPACES)"

Earnings

Fee Distribution

For each completed inference:

80% goes to Compute Node (you)
10% goes to Gateway
5% goes to Validators
5% goes to Treasury

Track Earnings

# Check wallet balance
cast balance $NODE_WALLET_ADDRESS --rpc-url https://testnet-rpc-1.forknet.io

# View completed tasks (from logs)
grep "completed" /var/log/ncn/compute.log | wc -l

Performance Optimization

GPU Optimization

# Set CUDA visible devices
export CUDA_VISIBLE_DEVICES=0

# Use TensorFloat-32
export TF32_OVERRIDE=1

CPU Optimization

# Set thread affinity
taskset -c 0-7 compute_node ...

# Increase file descriptors
ulimit -n 65535

Memory Management

# Monitor memory usage
watch -n 1 "free -h && nvidia-smi"

# Set memory limits
MAX_MEMORY_MB=16384

Maintenance

Update Compute Node

# Stop service
sudo systemctl stop ncn-compute

# Pull updates
cd ncn-network-v2-rs
git pull

# Rebuild
cargo build --release -p compute_node

# Update binary
sudo cp target/release/compute_node /usr/local/bin/

# Restart
sudo systemctl start ncn-compute

Update Python Dependencies

/opt/ncn/venv/bin/pip install --upgrade torch transformers

Update Models

# Manually or with sync
cargo run --bin compute_node -- --sync-models --gateway-addr ...

Troubleshooting

"Sandbox execution failed"

# Check kernel support
uname -r  # Must be 5.13+ for Landlock

# Try permissive mode
SANDBOX_MODE=permissive

# Check logs for specific error
journalctl -u ncn-compute -n 50

"Model not found"

Verify model path exists
Check file permissions
Ensure model is downloaded

"Gateway connection failed"

Check gateway address
Verify network connectivity
Check firewall rules

"Out of memory"

Increase MAX_MEMORY_MB
Reduce model batch size
Add more RAM/swap

Security Best Practices

Keep Sandbox Enabled
- Always use strict mode in production
Secure Private Key
- Store in secure location
- Use different key than personal wallet
Monitor for Anomalies
- Watch for unusual resource usage
- Alert on failed executions
Regular Updates
- Keep system packages updated
- Update models regularly

Next Steps

Monitoring - Set up monitoring
Troubleshooting - Common issues
Security - Security documentation

PreviousGateway Operator NextValidator Operator

Last updated 3 months ago

hashtagOverview

hashtagRequirements

hashtagHardware

hashtagSoftware

hashtagGPU Setup (Optional but Recommended)

hashtagSetup

hashtag1. Install Dependencies

hashtag2. Build Compute Node

hashtag3. Set Up Python Environment

hashtag4. Generate Wallet

hashtag5. Configure Compute Node

hashtag6. Create Systemd Service

hashtag7. Start Compute Node

hashtagDownload Models

hashtagManual Download

hashtagUsing sync-models (Automatic)

hashtagModel Directory Structure

hashtagOperations

hashtagMonitor Compute Node

hashtagView Task Execution

hashtagCheck Resource Usage

hashtagSandbox Configuration

hashtagSandbox Modes

hashtagStrict Mode Features

hashtagTroubleshooting Sandbox

hashtagEarnings

hashtagFee Distribution

hashtagTrack Earnings

hashtagPerformance Optimization

hashtagGPU Optimization

hashtagCPU Optimization

hashtagMemory Management

hashtagMaintenance

hashtagUpdate Compute Node

hashtagUpdate Python Dependencies

hashtagUpdate Models

hashtagTroubleshooting

hashtag"Sandbox execution failed"

hashtag"Model not found"

hashtag"Gateway connection failed"

hashtag"Out of memory"

hashtagSecurity Best Practices

hashtagNext Steps