# Compute Node

The Compute Node executes AI models in secure sandboxed environments. It connects to Gateway Nodes, receives inference tasks, and returns signed results.

***

## Overview

```
┌─────────────────────────────────────────────────────────────────────────┐
│                            Compute Node                                  │
│                                                                         │
│   Gateway Connection                                                    │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │  Bidirectional gRPC Stream                                       │   │
│   │  ◀── Receive InferenceRequest                                    │   │
│   │  ──▶ Send NodeInfo (heartbeat)                                   │   │
│   │  ──▶ Send InferenceResponse                                      │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                           │                                             │
│                           ▼                                             │
│   Execution Manager                                                     │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐       │   │
│   │  │ Model Loader  │  │ Executor      │  │ Result Signer │       │   │
│   │  │ & Cache       │  │ Selector      │  │               │       │   │
│   │  └───────────────┘  └───────────────┘  └───────────────┘       │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                           │                                             │
│                           ▼                                             │
│   Sandbox Layer                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │  ┌─────────────────────────────────────────────────────────┐    │   │
│   │  │                    OS Sandbox                            │    │   │
│   │  │  • seccomp (syscall filtering)                          │    │   │
│   │  │  • Linux namespaces (PID, network, mount, IPC)          │    │   │
│   │  │  • Landlock (filesystem isolation)                       │    │   │
│   │  │  • Resource limits (CPU, memory, time)                   │    │   │
│   │  └─────────────────────────────────────────────────────────┘    │   │
│   │  ┌─────────────────────────────────────────────────────────┐    │   │
│   │  │                  Python Executor                         │    │   │
│   │  │  • Isolated Python process                              │    │   │
│   │  │  • File-based I/O                                       │    │   │
│   │  │  • Timeout enforcement                                   │    │   │
│   │  └─────────────────────────────────────────────────────────┘    │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                           │                                             │
│                           ▼                                             │
│   Model Storage                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │  ./models/                                                       │   │
│   │  ├── bark_semantic_model.pt                                      │   │
│   │  ├── bark_coarse_model.pt                                        │   │
│   │  └── bark_fine_model.pt                                          │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘
```

***

## Features

* **Sandboxed Execution**: Multi-layer security isolation for model execution
* **Python Executor**: Runs PyTorch and Transformers models
* **Cryptographic Signing**: Signs all computation results
* **Model Caching**: Efficient model loading with TTL-based cache
* **Model Sync**: Auto-download models from subnets
* **Resource Limits**: CPU, memory, and time bounds

***

## Quick Start

### Build

```bash
cd ncn-network-v2-rs
cargo build --release -p compute_node
```

### Run

```bash
cargo run --release --bin compute_node -- \
  --gateway-addr http://127.0.0.1:50051 \
  --model-path ./models
```

**Expected Output**:

```
Starting Compute Node...
Loading wallet from environment...
✓ Wallet initialized: 0x742d35Cc...
Connecting to Gateway at http://127.0.0.1:50051...
✓ Connected to Gateway
✓ Registered with supported models: ["bark_semantic", "bark_coarse"]
✓ Compute Node ready for tasks
```

***

## Configuration

### Command Line Arguments

| Argument          | Required | Default    | Description                     |
| ----------------- | -------- | ---------- | ------------------------------- |
| `--gateway-addr`  | Yes      | -          | Gateway gRPC address            |
| `--model-path`    | No       | `./models` | Model files directory           |
| `--registry-addr` | No       | -          | P2P Registry address (for sync) |
| `--subnet-id`     | No       | `1`        | Subnet to join                  |
| `--sync-models`   | No       | `false`    | Enable model synchronization    |
| `--models-dir`    | No       | `./models` | Download directory              |

### Environment Variables

| Variable                    | Required | Default            | Description                               |
| --------------------------- | -------- | ------------------ | ----------------------------------------- |
| `COMPUTE_NODE_PRIVATE_KEY`  | Yes\*    | -                  | Node wallet private key                   |
| `NODE_WALLET_ADDRESS`       | No       | Derived            | Node wallet address                       |
| `SANDBOX_MODE`              | No       | `strict`           | Sandbox mode (strict/permissive/disabled) |
| `EXECUTION_TIMEOUT_SECS`    | No       | `300`              | Max execution time                        |
| `MAX_MEMORY_MB`             | No       | `4096`             | Max memory per execution                  |
| `ENABLE_EXECUTION_FALLBACK` | No       | `false`            | Enable fallback execution                 |
| `REQUIRE_PAYMENT`           | No       | `false`            | Require payment validation                |
| `PYTHON_EXECUTOR_PATH`      | No       | `python_executors` | Path to Python executors                  |

\*Auto-generated if not provided (development only)

See [Configuration Reference](https://github.com/NeuroChainAi/docs-guides/blob/main/components/compute-node/configuration.md) for complete details.

***

## Sandbox Security

The compute node uses multiple layers of security isolation:

### Security Layers

| Layer | Technology            | Protection                             |
| ----- | --------------------- | -------------------------------------- |
| 1     | **seccomp**           | Filters dangerous system calls         |
| 2     | **PID namespace**     | Isolates process visibility            |
| 3     | **Network namespace** | No external network access             |
| 4     | **Mount namespace**   | Isolated filesystem view               |
| 5     | **Landlock**          | Fine-grained filesystem access control |
| 6     | **Resource limits**   | Bounds CPU, memory, time               |

### Sandbox Modes

| Mode         | seccomp | Namespaces | Landlock | Limits | Use Case         |
| ------------ | ------- | ---------- | -------- | ------ | ---------------- |
| `strict`     | ✅       | ✅          | ✅        | ✅      | Production       |
| `permissive` | ✅       | ⚠️         | ⚠️       | ✅      | Testing          |
| `disabled`   | ❌       | ❌          | ❌        | ⚠️     | Development only |

### Blocked Operations

| Operation              | Protection                  |
| ---------------------- | --------------------------- |
| Network access         | Network namespace isolation |
| Read `/etc/passwd`     | Landlock filesystem rules   |
| Fork bomb              | `RLIMIT_NPROC` limit        |
| Memory exhaustion      | `RLIMIT_AS` limit           |
| CPU exhaustion         | `RLIMIT_CPU` limit          |
| Access other processes | PID namespace isolation     |

See [Sandbox Execution](https://github.com/NeuroChainAi/docs-guides/blob/main/components/compute-node/sandbox-execution.md) for deep dive.

***

## Model Execution

### Supported Model Formats

| Format      | Extension      | Framework    | Auto-detect |
| ----------- | -------------- | ------------ | ----------- |
| TorchScript | `.pt`, `.pth`  | PyTorch      | ✅           |
| ONNX        | `.onnx`        | ONNX Runtime | ✅           |
| Safetensors | `.safetensors` | Multiple     | ✅           |

### Executor Scripts

Located in `compute_node/python_executors/`:

| Script                       | Purpose                  |
| ---------------------------- | ------------------------ |
| `generic_model_executor.py`  | General-purpose executor |
| `semantic_model_executor.py` | Bark semantic stage      |
| `coarse_model_executor.py`   | Bark coarse stage        |

### Execution Flow

```
1. Receive InferenceRequest
        │
        ▼
2. Determine model type
        │
        ▼
3. Select appropriate executor script
        │
        ▼
4. Prepare sandbox environment
   ├── Create temp directory
   ├── Copy executor script
   └── Apply security restrictions
        │
        ▼
5. Execute in sandbox
   ├── Write input to file
   ├── Run Python process
   ├── Enforce timeout
   └── Read output from file
        │
        ▼
6. Sign result
   ├── Hash output (SHA256)
   └── Sign with node wallet (secp256k1)
        │
        ▼
7. Return InferenceResponse
   ├── output_data
   ├── response_hash
   ├── response_signature
   └── completion_validation
```

See [Model Management](https://github.com/NeuroChainAi/docs-guides/blob/main/components/compute-node/model-management.md) and [Python Executors](https://github.com/NeuroChainAi/docs-guides/blob/main/components/compute-node/python-executors.md) for details.

***

## Signing

All computation results are cryptographically signed:

### Signature Components

| Field                   | Description                 |
| ----------------------- | --------------------------- |
| `response_hash`         | SHA256 hash of output\_data |
| `response_signature`    | secp256k1 signature         |
| `completion_validation` | Complete attestation        |

### Signed Message Format

```
SHA256(request_id || result_data || timestamp)
```

### Verification

```rust
// On gateway or validator
let message = format!("{}{}{}", request_id, result_data, timestamp);
let message_hash = sha256(&message);
let recovered_address = signature.recover(message_hash)?;
assert_eq!(recovered_address, compute_node_address);
```

***

## Model Synchronization

Compute nodes can auto-download models from subnets:

### Enable Sync

```bash
cargo run --bin compute_node -- \
  --gateway-addr http://127.0.0.1:50051 \
  --registry-addr http://127.0.0.1:50050 \
  --subnet-id 1 \
  --sync-models \
  --models-dir ./models
```

### Sync Process

1. Query P2P Registry for subnet metadata
2. Compare local models with subnet requirements
3. Download missing models
4. Verify model hashes
5. Extract executor scripts

***

## Metrics & Monitoring

### Execution Statistics

The compute node tracks:

* Total executions
* Successful / failed count
* Average execution time
* Model-specific statistics

### Logging

```bash
# Enable debug logging
RUST_LOG=debug cargo run --bin compute_node -- ...

# Filter to specific modules
RUST_LOG=compute_node::sandbox=trace cargo run --bin compute_node -- ...
```

***

## Troubleshooting

### Common Issues

**"Failed to connect to Gateway"**

```
Error: Failed to connect to Gateway at http://127.0.0.1:50051
```

* Verify Gateway is running
* Check `--gateway-addr` is correct

**"Model not found"**

```
Error: Model bark_semantic_model.pt not found at ./models/bark_semantic_model.pt
```

* Ensure model file exists in `--model-path`
* Enable `--sync-models` for auto-download

**"Sandbox execution failed"**

```
Error: Sandbox execution failed: Timeout after 300s
```

* Increase `EXECUTION_TIMEOUT_SECS`
* Check model complexity
* Use `SANDBOX_MODE=permissive` for debugging

**"Permission denied in sandbox"**

```
Error: Landlock: Access denied to /some/path
```

* Ensure all required files are in allowed paths
* Check Landlock rules in sandbox config

See [Compute Troubleshooting](https://github.com/NeuroChainAi/docs-guides/blob/main/troubleshooting/compute-issues.md) for more help.

***

## Python Environment Setup

### Create Virtual Environment

```bash
cd compute_node/python_executors

# Create environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Verify
python verify_environment.py
```

### Requirements

```
torch>=2.1.0
torchaudio>=2.1.0
transformers>=4.35.0
numpy>=1.24.3
scipy>=1.11.3
soundfile>=0.12.1
librosa>=0.10.1
```

***

## Related Documentation

* [Architecture](https://github.com/NeuroChainAi/docs-guides/blob/main/components/compute-node/architecture.md) - Internal architecture details
* [Sandbox Execution](https://github.com/NeuroChainAi/docs-guides/blob/main/components/compute-node/sandbox-execution.md) - Security isolation deep dive
* [Model Management](https://github.com/NeuroChainAi/docs-guides/blob/main/components/compute-node/model-management.md) - Model loading and caching
* [Python Executors](https://github.com/NeuroChainAi/docs-guides/blob/main/components/compute-node/python-executors.md) - Executor scripts
* [Configuration](https://github.com/NeuroChainAi/docs-guides/blob/main/components/compute-node/configuration.md) - Configuration options
