Monitoring
Monitor node health and performance
Monitoring ensures reliable operation and early problem detection. Critical for validators requiring high uptime.
Stack: Prometheus (metrics collection) + Grafana (visualization) + Alertmanager (notifications).
Monitoring Stack
| Component | Purpose | Complexity |
|---|---|---|
| Prometheus | Metrics collection/storage | Medium |
| Grafana | Visualization dashboards | Medium |
| Alertmanager | Alert routing/notifications | Medium |
| Node Exporter | System metrics (CPU/RAM/disk) | Low |
| Selendra | Blockchain-specific metrics | Low |
Prometheus Setup
Installation
# Create user and directories
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
# Download and install (v2.45.0)
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar -xvf prometheus-2.45.0.linux-amd64.tar.gz
cd prometheus-2.45.0.linux-amd64
sudo cp prometheus promtool /usr/local/bin/
sudo cp -r consoles console_libraries /etc/prometheus/
sudo chown -R prometheus:prometheus /usr/local/bin/prom* /etc/prometheus/
Configuration
Create /etc/prometheus/prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "selendra-node"
static_configs:
- targets: ["localhost:9615"]
labels:
instance: "validator-1"
- job_name: "node-exporter"
static_configs:
- targets: ["localhost:9100"]
labels:
instance: "validator-1"
rule_files:
- "alert_rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
Systemd Service
Create /etc/systemd/system/prometheus.service:
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--storage.tsdb.retention.time=30d
Restart=on-failure
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
Access UI at http://localhost:9090.
Node Exporter
Provides system metrics:
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.0/node_exporter-1.6.0.linux-amd64.tar.gz
tar -xvf node_exporter-1.6.0.linux-amd64.tar.gz
sudo cp node_exporter-1.6.0.linux-amd64/node_exporter /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/node_exporter
Create /etc/systemd/system/node-exporter.service:
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now node-exporter
Grafana Setup
# Install
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt-get update && sudo apt-get install grafana
# Start
sudo systemctl enable --now grafana-server
Access at http://localhost:3000. Default: admin/admin.
Add Prometheus: Configuration → Data Sources → Add Prometheus → URL: http://localhost:9090
Import Dashboard: Create → Import → Paste JSON or ID → Select Prometheus data source.
Key Metrics
Blockchain
| Metric | Query | Purpose |
|---|---|---|
| Block height | substrate_block_height{status="best"} | Sync status (match network head) |
| Finalized height | substrate_block_height{status="finalized"} | Lags best by ~2 blocks |
| Peer count | substrate_sub_libp2p_peers_count | Should be 25+ (low indicates issues) |
| Syncing status | substrate_sub_libp2p_is_major_syncing | 1=syncing, 0=synced |
| Block production | substrate_proposer_block_constructed_count | Validator block creation counter |
| Finality | substrate_finality_alephbft_round | Finality participation tracking |
System
| Metric | Query | Alert Threshold |
|---|---|---|
| CPU usage | 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) | >80% sustained |
| Memory available | (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 | <20% available |
| Disk usage | (1 - (node_filesystem_free_bytes / node_filesystem_size_bytes)) * 100 | >80% used |
| Disk I/O | rate(node_disk_io_time_seconds_total[5m]) | Sustained high I/O |
| Network traffic | rate(node_network_receive_bytes_total[5m]) | Bandwidth saturation |
Alert Configuration
Alertmanager
cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar -xvf alertmanager-0.26.0.linux-amd64.tar.gz
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo mkdir /etc/alertmanager /var/lib/alertmanager
sudo cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager
Create /etc/alertmanager/alertmanager.yml:
global:
resolve_timeout: 5m
route:
group_by: ["alertname"]
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: "telegram"
receivers:
- name: "telegram"
telegram_configs:
- bot_token: "YOUR_BOT_TOKEN"
chat_id: YOUR_CHAT_ID
message: "{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}"
Alternatives: email, Slack, PagerDuty, webhook.
Alert Rules
Create /etc/prometheus/alert_rules.yml:
groups:
- name: selendra_critical
interval: 30s
rules:
- alert: NodeDown
expr: up{job="selendra-node"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Selendra node offline"
- alert: LowPeerCount
expr: substrate_sub_libp2p_peers_count < 5
for: 5m
labels:
severity: warning
annotations:
summary: "Only {{ $value }} peers connected"
- alert: HighCPU
expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 10m
labels:
severity: warning
annotations:
summary: "CPU usage {{ $value }}%"
- alert: HighMemory
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: "Memory usage {{ $value }}%"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{mountpoint="/var/lib/selendra"} / node_filesystem_size_bytes) * 100 < 20
for: 5m
labels:
severity: warning
annotations:
summary: "{{ $value }}% disk remaining"
- alert: ValidatorNotProducing
expr: rate(substrate_proposer_block_constructed_count[1h]) == 0
for: 2h
labels:
severity: critical
annotations:
summary: "No blocks produced in 2 hours"
Telegram Notifications
Setup
- Message @BotFather on Telegram →
/newbot→ Save token - Message your bot to start conversation
- Get chat ID:
curl https://api.telegram.org/bot<YOUR_BOT_TOKEN>/getUpdates
Find "chat":{"id": CHAT_ID}.
- Add to alertmanager.yml (shown above)
Dashboard Panels
Validator Dashboard
Key panels:
- Block height (best vs finalized) - line graph
- Peer count - gauge (green: 25+, yellow: 10-24, red: <10)
- Block production counter
- Finality participation graph
- CPU/RAM/disk usage
- Network traffic
Full Node Dashboard
Key panels:
- Sync status indicator
- Block height gap (local vs network)
- RPC request rate
- Database size growth
- Query latency
Mobile Monitoring
Grafana Mobile App: iOS/Android app for remote dashboard access. Requires HTTPS and public IP/VPN.
Uptime Services:
- UptimeRobot: Free HTTP/ping monitoring, email/SMS alerts
- Pingdom: Professional monitoring, multiple locations
- Healthchecks.io: Cron job monitoring for metrics push
Security
Restrict access:
# Limit Prometheus metrics endpoint
sudo ufw allow from 10.0.0.0/24 to any port 9615
# Protect Grafana
# Change default password, enable HTTPS, restrict user permissions
Authentication: Prometheus lacks default auth. Use reverse proxy with authentication or strict firewall rules.
Advanced Topics
Multi-Node Monitoring
Add multiple targets to Prometheus config with labels for node roles (validator, sentry, archive):
scrape_configs:
- job_name: "selendra"
static_configs:
- targets: ["validator1:9615"]
labels:
role: "validator"
- targets: ["sentry1:9615"]
labels:
role: "sentry"
Log Aggregation
Loki + Promtail: Centralize logs from all nodes. Query logs alongside metrics in Grafana for unified observability.
Performance Tuning
Reduce query load:
- Limit dashboard time ranges (6h or 24h default)
- Use recording rules for expensive queries
- Reduce scrape interval for low-priority metrics (60s)
- Shorter retention for high-cardinality metrics (15d)
Recording rule example:
groups:
- name: recording_rules
interval: 60s
rules:
- record: instance:cpu_usage:rate5m
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Related Documentation
Run Full Node | Run Validator | Maintenance | Troubleshooting
Community
Telegram: t.me/selendranetwork | X: @selendranetwork | GitHub: github.com/selendra/selendra
Dashboard templates and alert configurations available. Community support for monitoring setup.
Found an issue or want to contribute?
Help us improve this documentation by editing this page on GitHub.
