# Production Deployment

Best practices and considerations for deploying NCN Network in production.

***

## Production Checklist

### Security

* [ ] TLS certificates configured for all endpoints
* [ ] Private keys stored in secure vault (HSM recommended)
* [ ] Firewall rules configured
* [ ] Rate limiting enabled
* [ ] Security audit completed

### Infrastructure

* [ ] High availability setup (multiple replicas)
* [ ] Load balancer configured
* [ ] Auto-scaling enabled
* [ ] Persistent storage for data
* [ ] Backup strategy implemented

### Monitoring

* [ ] Prometheus metrics enabled
* [ ] Grafana dashboards configured
* [ ] Alerting rules set up
* [ ] Log aggregation configured
* [ ] Health checks implemented

### Operations

* [ ] Runbooks documented
* [ ] Incident response plan
* [ ] Update/rollback procedures
* [ ] On-call rotation established

***

## TLS Configuration

### Generate Certificates

```bash
# Using Let's Encrypt with certbot
sudo apt install certbot
sudo certbot certonly --standalone -d api.ncn-network.io

# Certificates will be at:
# /etc/letsencrypt/live/api.ncn-network.io/fullchain.pem
# /etc/letsencrypt/live/api.ncn-network.io/privkey.pem
```

### Configure Gateway TLS

```bash
# gateway.env
TLS_ENABLED=true
TLS_CERT_PATH=/etc/ncn/certs/fullchain.pem
TLS_KEY_PATH=/etc/ncn/certs/privkey.pem
```

### Certificate Renewal

```bash
# Add to crontab
0 0 1 * * certbot renew --quiet && systemctl reload ncn-gateway
```

***

## High Availability

### Gateway HA

```
                    Load Balancer
                    ┌───────────────────┐
                    │   HAProxy/NGINX   │
                    │   (health checks) │
                    └─────────┬─────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
        ┌──────────┐   ┌──────────┐   ┌──────────┐
        │Gateway 1 │   │Gateway 2 │   │Gateway 3 │
        │ Server A │   │ Server B │   │ Server C │
        └──────────┘   └──────────┘   └──────────┘
```

### HAProxy Configuration

```haproxy
# /etc/haproxy/haproxy.cfg
frontend ncn-http
    bind *:80
    bind *:443 ssl crt /etc/haproxy/certs/ncn.pem
    default_backend ncn-gateway

backend ncn-gateway
    balance roundrobin
    option httpchk GET /health
    server gateway1 10.0.1.1:8080 check
    server gateway2 10.0.1.2:8080 check
    server gateway3 10.0.1.3:8080 check

frontend ncn-grpc
    bind *:50051
    mode tcp
    default_backend ncn-gateway-grpc

backend ncn-gateway-grpc
    mode tcp
    balance roundrobin
    server gateway1 10.0.1.1:50051 check
    server gateway2 10.0.1.2:50051 check
    server gateway3 10.0.1.3:50051 check
```

***

## Security Hardening

### Firewall Rules

```bash
# Public-facing (Gateway only)
sudo ufw allow 443/tcp      # HTTPS
sudo ufw allow 50051/tcp    # gRPC (if public)

# Internal only
sudo ufw deny 50050/tcp     # Registry gRPC (internal)
sudo ufw deny 8828/tcp      # Registry P2P (internal)

# Block all other inbound
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw enable
```

### Rate Limiting (NGINX)

```nginx
# /etc/nginx/conf.d/rate-limit.conf
limit_req_zone $binary_remote_addr zone=api:10m rate=100r/m;

server {
    location /api/ {
        limit_req zone=api burst=20 nodelay;
        proxy_pass http://ncn-gateway;
    }
}
```

### Key Management

**Development:**

* Environment variables (acceptable)

**Staging:**

* Secrets manager (AWS Secrets Manager, HashiCorp Vault)

**Production:**

* Hardware Security Module (HSM)
* Key Management Service (AWS KMS, GCP KMS)

```bash
# Using AWS Secrets Manager
GATEWAY_WALLET_PRIVATE_KEY=$(aws secretsmanager get-secret-value \
  --secret-id ncn/gateway-key \
  --query SecretString \
  --output text)
```

***

## Monitoring Setup

### Prometheus Configuration

```yaml
# prometheus.yml
scrape_configs:
  - job_name: 'ncn-gateway'
    static_configs:
      - targets: ['gateway1:8080', 'gateway2:8080', 'gateway3:8080']
    metrics_path: /metrics

  - job_name: 'ncn-registry'
    static_configs:
      - targets: ['registry1:50050', 'registry2:50050', 'registry3:50050']
```

### Key Metrics to Monitor

| Metric              | Alert Threshold |
| ------------------- | --------------- |
| Request latency p99 | > 5s            |
| Error rate          | > 5%            |
| CPU usage           | > 80%           |
| Memory usage        | > 85%           |
| Disk usage          | > 90%           |
| Active connections  | > 1000          |

### Alerting Rules

```yaml
# alerts.yml
groups:
- name: ncn-alerts
  rules:
  - alert: HighErrorRate
    expr: rate(ncn_requests_failed_total[5m]) / rate(ncn_requests_total[5m]) > 0.05
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High error rate on NCN Gateway"
      
  - alert: HighLatency
    expr: histogram_quantile(0.99, rate(ncn_request_duration_seconds_bucket[5m])) > 5
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High latency on NCN Gateway"
```

***

## Backup Strategy

### What to Backup

| Data                | Frequency             | Retention |
| ------------------- | --------------------- | --------- |
| Configuration files | Daily                 | 30 days   |
| Registry DHT data   | Every 6 hours         | 7 days    |
| Logs                | Daily                 | 90 days   |
| Wallet keys         | Once (secure offsite) | Forever   |

### Backup Script

```bash
#!/bin/bash
# /opt/ncn/scripts/backup.sh

DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR=/backup/ncn/$DATE

mkdir -p $BACKUP_DIR

# Backup configuration
cp -r /opt/ncn/config $BACKUP_DIR/

# Backup data
tar -czf $BACKUP_DIR/data.tar.gz /opt/ncn/data/

# Upload to S3
aws s3 sync $BACKUP_DIR s3://ncn-backups/$DATE/

# Cleanup old backups
find /backup/ncn -type d -mtime +30 -exec rm -rf {} \;
```

***

## Disaster Recovery

### Recovery Time Objective (RTO)

| Component | RTO        |
| --------- | ---------- |
| Gateway   | 5 minutes  |
| Registry  | 15 minutes |
| Compute   | 30 minutes |

### Recovery Procedures

**Gateway Failure:**

1. Health check detects failure
2. Load balancer removes from pool
3. Alert sent to ops team
4. Auto-restart or manual intervention
5. Verify health, add back to pool

**Registry Failure:**

1. P2P network continues with remaining nodes
2. Failed node restarts automatically
3. DHT data syncs from peers
4. Verify consensus capability

**Full Cluster Recovery:**

```bash
# 1. Start infrastructure
systemctl start ncn-registry
sleep 30

# 2. Start gateways
systemctl start ncn-gateway
sleep 10

# 3. Start compute nodes
systemctl start ncn-compute

# 4. Verify health
curl http://localhost:8080/health
```

***

## Performance Tuning

### System Limits

```bash
# /etc/security/limits.conf
ncn soft nofile 65535
ncn hard nofile 65535
ncn soft nproc 32768
ncn hard nproc 32768
```

### Kernel Parameters

```bash
# /etc/sysctl.conf
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
```

### Application Tuning

```bash
# Gateway tuning
TOKIO_WORKER_THREADS=4
HTTP_KEEP_ALIVE_TIMEOUT=60
GRPC_MAX_CONNECTIONS=1000
```

***

## Update Procedures

### Rolling Update

```bash
# 1. Update one gateway at a time
for host in gateway1 gateway2 gateway3; do
  ssh $host "systemctl stop ncn-gateway"
  ssh $host "cp /tmp/gateway_node /opt/ncn/bin/"
  ssh $host "systemctl start ncn-gateway"
  sleep 30
  # Verify health before continuing
  curl -f http://$host:8080/health || exit 1
done
```

### Rollback Procedure

```bash
# Keep previous version
cp /opt/ncn/bin/gateway_node /opt/ncn/bin/gateway_node.backup

# Rollback if needed
systemctl stop ncn-gateway
cp /opt/ncn/bin/gateway_node.backup /opt/ncn/bin/gateway_node
systemctl start ncn-gateway
```

***

## Compliance Considerations

### Data Handling

* Minimize data retention
* Encrypt data at rest
* Log access to sensitive data
* GDPR compliance for EU users

### Audit Logging

```bash
# Enable audit logging
AUDIT_LOG_ENABLED=true
AUDIT_LOG_PATH=/var/log/ncn/audit.log
```

***

## Next Steps

* [Monitoring](https://docs.neurochain.ai/nc/neurochainai-guides/operators/monitoring) - Detailed monitoring setup
* [Troubleshooting](https://docs.neurochain.ai/nc/neurochainai-guides/troubleshooting/troubleshooting) - Common issues
* [Security](https://docs.neurochain.ai/nc/neurochainai-guides/security/security) - Security documentation
