status: complete audience: both chapter: 02 last_updated: 2026-04 contributors: [alexwill87, claude-cockpit] lang: en


2.9 -- First health check

Context

Before moving forward, we verify that all basic infrastructure is working. This script will be reused in the following sections, in monitoring crons, and in the deployment script.

The complete script

Create the file:

$ mkdir -p ~/scripts
$ cat > ~/scripts/health-check.sh << 'HEALTHSCRIPT'
#!/bin/bash
# Health check infrastructure OA
# Usage : ./health-check.sh
# Exit codes : 0 = all OK, 1 = at least one failure

PASS=0
FAIL=0

check() {
  local name="$1"
  local cmd="$2"

  if eval "$cmd" > /dev/null 2>&1; then
    echo "[OK]    $name"
    ((PASS++))
  else
    echo "[ECHEC] $name"
    ((FAIL++))
  fi
}

echo "=== Health Check Infrastructure ==="
echo "Date : $(date)"
echo "---"

# Critical ports (verification before launching containers)
check "Port 8200 (Vault) in use" "ss -ltnp | grep -q :8200"
check "Port 5432 (PostgreSQL) in use" "ss -ltnp | grep -q :5432"

# Docker
check "Docker daemon" "docker info"
check "Docker Compose" "docker compose version"

# Vault
check "Vault container" "docker ps | grep vault | grep -q Up"
check "Vault unsealed" "docker exec vault vault status 2>&1 | grep -q 'Sealed.*false'"
check "Vault secret readable" "docker exec vault vault kv get secret/openrouter"

# PostgreSQL
check "PostgreSQL container" "docker ps | grep postgres | grep -q Up"
check "PostgreSQL connection" "docker exec postgres psql -U oa_admin -d oa_system -c 'SELECT 1;'"

# Tailscale
check "Tailscale active" "tailscale status"
check "Tailscale IP" "tailscale ip -4"

# System
check "Disk space > 10%" "test $(df / --output=pcent | tail -1 | tr -d '% ') -lt 90"
check "Available RAM > 1GB" "test $(free -m | awk '/Mem:/ {print \$7}') -gt 1024"

echo "---"
echo "Results : $PASS OK, $FAIL failure(s)"

if [ $FAIL -gt 0 ]; then
  exit 1
else
  echo "Infrastructure operational."
  exit 0
fi
HEALTHSCRIPT
$ chmod +x ~/scripts/health-check.sh

Verification

Run the script:

$ ~/scripts/health-check.sh

Expected output

=== Health Check Infrastructure ===
Date : Thu 02 Apr 2026 14:30:00 CEST
---
[OK]    Docker daemon
[OK]    Docker Compose
[OK]    Vault container
[OK]    Vault unsealed
[OK]    Vault secret readable
[OK]    PostgreSQL container
[OK]    PostgreSQL connection
[OK]    Tailscale active
[OK]    Tailscale IP
[OK]    Disk space > 10%
[OK]    Available RAM > 1GB
---
Results : 11 OK, 0 failure(s)
Infrastructure operational.

Add to cron (optional)

For verification every 15 minutes with logging:

$ crontab -e

Add:

*/15 * * * * /home/deploy/scripts/health-check.sh >> /home/deploy/logs/health-check.log 2>&1

Note: The port checks verify that the ports are in use (by the expected containers). If a port is free when a container should be running, that's an indicator of a problem. Before a docker compose up, you can also verify that a port is NOT already in use by another process with ss -ltnp | grep :PORT.

Diagnostic by symptom

Symptom Probable cause Action
Docker daemon FAILURE Docker not started sudo systemctl start docker
Vault container FAILURE Container stopped cd ~/docker/vault && docker compose up -d
Vault unsealed FAILURE Vault sealed after restart Run unseal (3 keys)
Vault secret FAILURE KV engine not enabled or token expired vault secrets enable -path=secret kv-v2
PostgreSQL FAILURE Container stopped or crashed cd ~/docker/postgres && docker compose up -d then check logs: docker logs postgres
Tailscale FAILURE Service stopped sudo systemctl start tailscaled && sudo tailscale up
Disk space FAILURE Disk full docker system prune -a and check logs/backups
RAM FAILURE Memory saturated docker stats to identify the heavy container

Common errors

  • The script fails on Vault secret: Vault may be sealed. This is normal after a restart. Unseal first.
  • "free: command not found": Install procps: sudo apt install -y procps.
  • False positive on disk space: The threshold is set to 90% used. Adjust if needed.

Verification

The script IS the verification. If it exits with code 0, everything is good:

$ ~/scripts/health-check.sh && echo "Ready for next step"

Estimated time

10 minutes.

Well done, you completed this section!
You covered: Context, The complete script, Verification, Expected output and 5 more. Continue →

Commentaires et discussions


← PostgreSQL OpenClaw Installation →