Monitoring

Monitor node health and performance

Monitoring ensures reliable operation and early problem detection. Critical for validators requiring high uptime.

Stack: Prometheus (metrics collection) + Grafana (visualization) + Alertmanager (notifications).

Monitoring Stack

Component	Purpose	Complexity
Prometheus	Metrics collection/storage	Medium
Grafana	Visualization dashboards	Medium
Alertmanager	Alert routing/notifications	Medium
Node Exporter	System metrics (CPU/RAM/disk)	Low
Selendra	Blockchain-specific metrics	Low

Prometheus Setup

Installation

# Create user and directories
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

# Download and install (v2.45.0)
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar -xvf prometheus-2.45.0.linux-amd64.tar.gz
cd prometheus-2.45.0.linux-amd64

sudo cp prometheus promtool /usr/local/bin/
sudo cp -r consoles console_libraries /etc/prometheus/
sudo chown -R prometheus:prometheus /usr/local/bin/prom* /etc/prometheus/

Configuration

Create /etc/prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "selendra-node"
    static_configs:
      - targets: ["localhost:9615"]
        labels:
          instance: "validator-1"

  - job_name: "node-exporter"
    static_configs:
      - targets: ["localhost:9100"]
        labels:
          instance: "validator-1"

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["localhost:9093"]

Systemd Service

Create /etc/systemd/system/prometheus.service:

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --storage.tsdb.retention.time=30d
Restart=on-failure

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable --now prometheus

Access UI at http://localhost:9090.

Node Exporter

Provides system metrics:

cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.0/node_exporter-1.6.0.linux-amd64.tar.gz
tar -xvf node_exporter-1.6.0.linux-amd64.tar.gz
sudo cp node_exporter-1.6.0.linux-amd64/node_exporter /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/node_exporter

Create /etc/systemd/system/node-exporter.service:

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable --now node-exporter

Grafana Setup

# Install
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt-get update && sudo apt-get install grafana

# Start
sudo systemctl enable --now grafana-server

Access at http://localhost:3000. Default: admin/admin.

Add Prometheus: Configuration → Data Sources → Add Prometheus → URL: http://localhost:9090

Import Dashboard: Create → Import → Paste JSON or ID → Select Prometheus data source.

Key Metrics

Blockchain

Metric	Query	Purpose
Block height	`substrate_block_height{status="best"}`	Sync status (match network head)
Finalized height	`substrate_block_height{status="finalized"}`	Lags best by ~2 blocks
Peer count	`substrate_sub_libp2p_peers_count`	Should be 25+ (low indicates issues)
Syncing status	`substrate_sub_libp2p_is_major_syncing`	1=syncing, 0=synced
Block production	`substrate_proposer_block_constructed_count`	Validator block creation counter
Finality	`substrate_finality_alephbft_round`	Finality participation tracking

System

Metric	Query	Alert Threshold
CPU usage	`100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)`	>80% sustained
Memory available	`(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100`	<20% available
Disk usage	`(1 - (node_filesystem_free_bytes / node_filesystem_size_bytes)) * 100`	>80% used
Disk I/O	`rate(node_disk_io_time_seconds_total[5m])`	Sustained high I/O
Network traffic	`rate(node_network_receive_bytes_total[5m])`	Bandwidth saturation

Alert Configuration

Alertmanager

cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar -xvf alertmanager-0.26.0.linux-amd64.tar.gz

sudo useradd --no-create-home --shell /bin/false alertmanager
sudo mkdir /etc/alertmanager /var/lib/alertmanager
sudo cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager

Create /etc/alertmanager/alertmanager.yml:

global:
  resolve_timeout: 5m

route:
  group_by: ["alertname"]
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: "telegram"

receivers:
  - name: "telegram"
    telegram_configs:
      - bot_token: "YOUR_BOT_TOKEN"
        chat_id: YOUR_CHAT_ID
        message: "{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}"

Alternatives: email, Slack, PagerDuty, webhook.

Alert Rules

Create /etc/prometheus/alert_rules.yml:

groups:
  - name: selendra_critical
    interval: 30s
    rules:
      - alert: NodeDown
        expr: up{job="selendra-node"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Selendra node offline"

      - alert: LowPeerCount
        expr: substrate_sub_libp2p_peers_count < 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Only {{ $value }} peers connected"

      - alert: HighCPU
        expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "CPU usage {{ $value }}%"

      - alert: HighMemory
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Memory usage {{ $value }}%"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{mountpoint="/var/lib/selendra"} / node_filesystem_size_bytes) * 100 < 20
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "{{ $value }}% disk remaining"

      - alert: ValidatorNotProducing
        expr: rate(substrate_proposer_block_constructed_count[1h]) == 0
        for: 2h
        labels:
          severity: critical
        annotations:
          summary: "No blocks produced in 2 hours"

Telegram Notifications

Setup

Message @BotFather on Telegram → /newbot → Save token
Message your bot to start conversation
Get chat ID:

curl https://api.telegram.org/bot<YOUR_BOT_TOKEN>/getUpdates

Find "chat":{"id": CHAT_ID}.

Add to alertmanager.yml (shown above)

Dashboard Panels

Validator Dashboard

Key panels:

Block height (best vs finalized) - line graph
Peer count - gauge (green: 25+, yellow: 10-24, red: <10)
Block production counter
Finality participation graph
CPU/RAM/disk usage
Network traffic

Full Node Dashboard

Key panels:

Sync status indicator
Block height gap (local vs network)
RPC request rate
Database size growth
Query latency

Mobile Monitoring

Grafana Mobile App: iOS/Android app for remote dashboard access. Requires HTTPS and public IP/VPN.

Uptime Services:

UptimeRobot: Free HTTP/ping monitoring, email/SMS alerts
Pingdom: Professional monitoring, multiple locations
Healthchecks.io: Cron job monitoring for metrics push

Security

Restrict access:

# Limit Prometheus metrics endpoint
sudo ufw allow from 10.0.0.0/24 to any port 9615

# Protect Grafana
# Change default password, enable HTTPS, restrict user permissions

Authentication: Prometheus lacks default auth. Use reverse proxy with authentication or strict firewall rules.

Advanced Topics

Multi-Node Monitoring

Add multiple targets to Prometheus config with labels for node roles (validator, sentry, archive):

scrape_configs:
  - job_name: "selendra"
    static_configs:
      - targets: ["validator1:9615"]
        labels:
          role: "validator"
      - targets: ["sentry1:9615"]
        labels:
          role: "sentry"

Log Aggregation

Loki + Promtail: Centralize logs from all nodes. Query logs alongside metrics in Grafana for unified observability.

Performance Tuning

Reduce query load:

Limit dashboard time ranges (6h or 24h default)
Use recording rules for expensive queries
Reduce scrape interval for low-priority metrics (60s)
Shorter retention for high-cardinality metrics (15d)

Recording rule example:

groups:
  - name: recording_rules
    interval: 60s
    rules:
      - record: instance:cpu_usage:rate5m
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Run Full Node | Run Validator | Maintenance | Troubleshooting

Community

Telegram: t.me/selendranetwork | X: @selendranetwork | GitHub: github.com/selendra/selendra

Dashboard templates and alert configurations available. Community support for monitoring setup.

Staking

Maintenance

Contribute

Found an issue or want to contribute?

Help us improve this documentation by editing this page on GitHub.

Edit this page on GitHub

← Back to Documentation

Documentation