Selendra

Documentation

Monitoring

Monitor node health and performance

Monitoring ensures reliable operation and early problem detection. Critical for validators requiring high uptime.

Stack: Prometheus (metrics collection) + Grafana (visualization) + Alertmanager (notifications).

Monitoring Stack

ComponentPurposeComplexity
PrometheusMetrics collection/storageMedium
GrafanaVisualization dashboardsMedium
AlertmanagerAlert routing/notificationsMedium
Node ExporterSystem metrics (CPU/RAM/disk)Low
SelendraBlockchain-specific metricsLow

Prometheus Setup

Installation

# Create user and directories
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

# Download and install (v2.45.0)
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar -xvf prometheus-2.45.0.linux-amd64.tar.gz
cd prometheus-2.45.0.linux-amd64

sudo cp prometheus promtool /usr/local/bin/
sudo cp -r consoles console_libraries /etc/prometheus/
sudo chown -R prometheus:prometheus /usr/local/bin/prom* /etc/prometheus/

Configuration

Create /etc/prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "selendra-node"
    static_configs:
      - targets: ["localhost:9615"]
        labels:
          instance: "validator-1"

  - job_name: "node-exporter"
    static_configs:
      - targets: ["localhost:9100"]
        labels:
          instance: "validator-1"

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["localhost:9093"]

Systemd Service

Create /etc/systemd/system/prometheus.service:

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --storage.tsdb.retention.time=30d
Restart=on-failure

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus

Access UI at http://localhost:9090.


Node Exporter

Provides system metrics:

cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.0/node_exporter-1.6.0.linux-amd64.tar.gz
tar -xvf node_exporter-1.6.0.linux-amd64.tar.gz
sudo cp node_exporter-1.6.0.linux-amd64/node_exporter /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/node_exporter

Create /etc/systemd/system/node-exporter.service:

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now node-exporter

Grafana Setup

# Install
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt-get update && sudo apt-get install grafana

# Start
sudo systemctl enable --now grafana-server

Access at http://localhost:3000. Default: admin/admin.

Add Prometheus: Configuration → Data Sources → Add Prometheus → URL: http://localhost:9090

Import Dashboard: Create → Import → Paste JSON or ID → Select Prometheus data source.


Key Metrics

Blockchain

MetricQueryPurpose
Block heightsubstrate_block_height{status="best"}Sync status (match network head)
Finalized heightsubstrate_block_height{status="finalized"}Lags best by ~2 blocks
Peer countsubstrate_sub_libp2p_peers_countShould be 25+ (low indicates issues)
Syncing statussubstrate_sub_libp2p_is_major_syncing1=syncing, 0=synced
Block productionsubstrate_proposer_block_constructed_countValidator block creation counter
Finalitysubstrate_finality_alephbft_roundFinality participation tracking

System

MetricQueryAlert Threshold
CPU usage100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)>80% sustained
Memory available(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100<20% available
Disk usage(1 - (node_filesystem_free_bytes / node_filesystem_size_bytes)) * 100>80% used
Disk I/Orate(node_disk_io_time_seconds_total[5m])Sustained high I/O
Network trafficrate(node_network_receive_bytes_total[5m])Bandwidth saturation

Alert Configuration

Alertmanager

cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar -xvf alertmanager-0.26.0.linux-amd64.tar.gz

sudo useradd --no-create-home --shell /bin/false alertmanager
sudo mkdir /etc/alertmanager /var/lib/alertmanager
sudo cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager

Create /etc/alertmanager/alertmanager.yml:

global:
  resolve_timeout: 5m

route:
  group_by: ["alertname"]
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: "telegram"

receivers:
  - name: "telegram"
    telegram_configs:
      - bot_token: "YOUR_BOT_TOKEN"
        chat_id: YOUR_CHAT_ID
        message: "{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}"

Alternatives: email, Slack, PagerDuty, webhook.

Alert Rules

Create /etc/prometheus/alert_rules.yml:

groups:
  - name: selendra_critical
    interval: 30s
    rules:
      - alert: NodeDown
        expr: up{job="selendra-node"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Selendra node offline"

      - alert: LowPeerCount
        expr: substrate_sub_libp2p_peers_count < 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Only {{ $value }} peers connected"

      - alert: HighCPU
        expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "CPU usage {{ $value }}%"

      - alert: HighMemory
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Memory usage {{ $value }}%"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{mountpoint="/var/lib/selendra"} / node_filesystem_size_bytes) * 100 < 20
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "{{ $value }}% disk remaining"

      - alert: ValidatorNotProducing
        expr: rate(substrate_proposer_block_constructed_count[1h]) == 0
        for: 2h
        labels:
          severity: critical
        annotations:
          summary: "No blocks produced in 2 hours"

Telegram Notifications

Setup

  1. Message @BotFather on Telegram → /newbot → Save token
  2. Message your bot to start conversation
  3. Get chat ID:
curl https://api.telegram.org/bot<YOUR_BOT_TOKEN>/getUpdates

Find "chat":{"id": CHAT_ID}.

  1. Add to alertmanager.yml (shown above)

Dashboard Panels

Validator Dashboard

Key panels:

  • Block height (best vs finalized) - line graph
  • Peer count - gauge (green: 25+, yellow: 10-24, red: <10)
  • Block production counter
  • Finality participation graph
  • CPU/RAM/disk usage
  • Network traffic

Full Node Dashboard

Key panels:

  • Sync status indicator
  • Block height gap (local vs network)
  • RPC request rate
  • Database size growth
  • Query latency

Mobile Monitoring

Grafana Mobile App: iOS/Android app for remote dashboard access. Requires HTTPS and public IP/VPN.

Uptime Services:

  • UptimeRobot: Free HTTP/ping monitoring, email/SMS alerts
  • Pingdom: Professional monitoring, multiple locations
  • Healthchecks.io: Cron job monitoring for metrics push

Security

Restrict access:

# Limit Prometheus metrics endpoint
sudo ufw allow from 10.0.0.0/24 to any port 9615

# Protect Grafana
# Change default password, enable HTTPS, restrict user permissions

Authentication: Prometheus lacks default auth. Use reverse proxy with authentication or strict firewall rules.


Advanced Topics

Multi-Node Monitoring

Add multiple targets to Prometheus config with labels for node roles (validator, sentry, archive):

scrape_configs:
  - job_name: "selendra"
    static_configs:
      - targets: ["validator1:9615"]
        labels:
          role: "validator"
      - targets: ["sentry1:9615"]
        labels:
          role: "sentry"

Log Aggregation

Loki + Promtail: Centralize logs from all nodes. Query logs alongside metrics in Grafana for unified observability.

Performance Tuning

Reduce query load:

  • Limit dashboard time ranges (6h or 24h default)
  • Use recording rules for expensive queries
  • Reduce scrape interval for low-priority metrics (60s)
  • Shorter retention for high-cardinality metrics (15d)

Recording rule example:

groups:
  - name: recording_rules
    interval: 60s
    rules:
      - record: instance:cpu_usage:rate5m
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Run Full Node | Run Validator | Maintenance | Troubleshooting

Community

Telegram: t.me/selendranetwork | X: @selendranetwork | GitHub: github.com/selendra/selendra

Dashboard templates and alert configurations available. Community support for monitoring setup.

Contribute

Found an issue or want to contribute?

Help us improve this documentation by editing this page on GitHub.

Edit this page on GitHub