Compute Node

The Compute Node executes AI models in secure sandboxed environments. It connects to Gateway Nodes, receives inference tasks, and returns signed results.


Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                            Compute Node                                  │
│                                                                         │
│   Gateway Connection                                                    │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │  Bidirectional gRPC Stream                                       │   │
│   │  ◀── Receive InferenceRequest                                    │   │
│   │  ──▶ Send NodeInfo (heartbeat)                                   │   │
│   │  ──▶ Send InferenceResponse                                      │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                           │                                             │
│                           ▼                                             │
│   Execution Manager                                                     │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐       │   │
│   │  │ Model Loader  │  │ Executor      │  │ Result Signer │       │   │
│   │  │ & Cache       │  │ Selector      │  │               │       │   │
│   │  └───────────────┘  └───────────────┘  └───────────────┘       │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                           │                                             │
│                           ▼                                             │
│   Sandbox Layer                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │  ┌─────────────────────────────────────────────────────────┐    │   │
│   │  │                    OS Sandbox                            │    │   │
│   │  │  • seccomp (syscall filtering)                          │    │   │
│   │  │  • Linux namespaces (PID, network, mount, IPC)          │    │   │
│   │  │  • Landlock (filesystem isolation)                       │    │   │
│   │  │  • Resource limits (CPU, memory, time)                   │    │   │
│   │  └─────────────────────────────────────────────────────────┘    │   │
│   │  ┌─────────────────────────────────────────────────────────┐    │   │
│   │  │                  Python Executor                         │    │   │
│   │  │  • Isolated Python process                              │    │   │
│   │  │  • File-based I/O                                       │    │   │
│   │  │  • Timeout enforcement                                   │    │   │
│   │  └─────────────────────────────────────────────────────────┘    │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                           │                                             │
│                           ▼                                             │
│   Model Storage                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │  ./models/                                                       │   │
│   │  ├── bark_semantic_model.pt                                      │   │
│   │  ├── bark_coarse_model.pt                                        │   │
│   │  └── bark_fine_model.pt                                          │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Features

  • Sandboxed Execution: Multi-layer security isolation for model execution

  • Python Executor: Runs PyTorch and Transformers models

  • Cryptographic Signing: Signs all computation results

  • Model Caching: Efficient model loading with TTL-based cache

  • Model Sync: Auto-download models from subnets

  • Resource Limits: CPU, memory, and time bounds


Quick Start

Build

Run

Expected Output:


Configuration

Command Line Arguments

Argument
Required
Default
Description

--gateway-addr

Yes

-

Gateway gRPC address

--model-path

No

./models

Model files directory

--registry-addr

No

-

P2P Registry address (for sync)

--subnet-id

No

1

Subnet to join

--sync-models

No

false

Enable model synchronization

--models-dir

No

./models

Download directory

Environment Variables

Variable
Required
Default
Description

COMPUTE_NODE_PRIVATE_KEY

Yes*

-

Node wallet private key

NODE_WALLET_ADDRESS

No

Derived

Node wallet address

SANDBOX_MODE

No

strict

Sandbox mode (strict/permissive/disabled)

EXECUTION_TIMEOUT_SECS

No

300

Max execution time

MAX_MEMORY_MB

No

4096

Max memory per execution

ENABLE_EXECUTION_FALLBACK

No

false

Enable fallback execution

REQUIRE_PAYMENT

No

false

Require payment validation

PYTHON_EXECUTOR_PATH

No

python_executors

Path to Python executors

*Auto-generated if not provided (development only)

See Configuration Reference for complete details.


Sandbox Security

The compute node uses multiple layers of security isolation:

Security Layers

Layer
Technology
Protection

1

seccomp

Filters dangerous system calls

2

PID namespace

Isolates process visibility

3

Network namespace

No external network access

4

Mount namespace

Isolated filesystem view

5

Landlock

Fine-grained filesystem access control

6

Resource limits

Bounds CPU, memory, time

Sandbox Modes

Mode
seccomp
Namespaces
Landlock
Limits
Use Case

strict

Production

permissive

⚠️

⚠️

Testing

disabled

⚠️

Development only

Blocked Operations

Operation
Protection

Network access

Network namespace isolation

Read /etc/passwd

Landlock filesystem rules

Fork bomb

RLIMIT_NPROC limit

Memory exhaustion

RLIMIT_AS limit

CPU exhaustion

RLIMIT_CPU limit

Access other processes

PID namespace isolation

See Sandbox Execution for deep dive.


Model Execution

Supported Model Formats

Format
Extension
Framework
Auto-detect

TorchScript

.pt, .pth

PyTorch

ONNX

.onnx

ONNX Runtime

Safetensors

.safetensors

Multiple

Executor Scripts

Located in compute_node/python_executors/:

Script
Purpose

generic_model_executor.py

General-purpose executor

semantic_model_executor.py

Bark semantic stage

coarse_model_executor.py

Bark coarse stage

Execution Flow

See Model Management and Python Executors for details.


Signing

All computation results are cryptographically signed:

Signature Components

Field
Description

response_hash

SHA256 hash of output_data

response_signature

secp256k1 signature

completion_validation

Complete attestation

Signed Message Format

Verification


Model Synchronization

Compute nodes can auto-download models from subnets:

Enable Sync

Sync Process

  1. Query P2P Registry for subnet metadata

  2. Compare local models with subnet requirements

  3. Download missing models

  4. Verify model hashes

  5. Extract executor scripts


Metrics & Monitoring

Execution Statistics

The compute node tracks:

  • Total executions

  • Successful / failed count

  • Average execution time

  • Model-specific statistics

Logging


Troubleshooting

Common Issues

"Failed to connect to Gateway"

  • Verify Gateway is running

  • Check --gateway-addr is correct

"Model not found"

  • Ensure model file exists in --model-path

  • Enable --sync-models for auto-download

"Sandbox execution failed"

  • Increase EXECUTION_TIMEOUT_SECS

  • Check model complexity

  • Use SANDBOX_MODE=permissive for debugging

"Permission denied in sandbox"

  • Ensure all required files are in allowed paths

  • Check Landlock rules in sandbox config

See Compute Troubleshooting for more help.


Python Environment Setup

Create Virtual Environment

Requirements


Last updated