Home Lab Monitoring with Grafana and Prometheus

Automation 2026-02-09 grafana prometheus monitoring dashboards alerts

You can't fix what you can't see. Running a homelab without monitoring is like driving without a dashboard — everything seems fine until something breaks, and by then you've been running on fumes for a week.

Prometheus and Grafana are the standard monitoring stack for good reason. Prometheus scrapes metrics from your machines and services. Grafana turns those metrics into dashboards and alerts. Together, they give you visibility into your entire lab: CPU, RAM, disk, network, temperatures, container health, and anything else you care to track.

This guide walks you through setting up the full stack, getting metrics flowing from your machines, building useful dashboards, and configuring alerts so you know when something needs attention.

Architecture Overview

The monitoring stack has three components:

Prometheus — The time-series database. It scrapes metrics from exporters at regular intervals and stores them.
Node Exporter — Runs on every machine you want to monitor. Exposes hardware and OS metrics as an HTTP endpoint.
Grafana — The visualization layer. Connects to Prometheus and lets you build dashboards, run queries, and set up alerts.

You can run all three on the same machine, or split them up. For a homelab, running Prometheus and Grafana on a single Docker host is fine. Node Exporter runs on every target machine.

Installing the Stack with Docker Compose

Create a directory for your monitoring stack:

mkdir -p /opt/monitoring
cd /opt/monitoring

Create docker-compose.yml:

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=90d'
      - '--web.enable-lifecycle'

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=changeme
      - GF_USERS_ALLOW_SIGN_UP=false

volumes:
  prometheus-data:
  grafana-data:

Create prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets:
        - '192.168.1.10:9100'   # proxmox
        - '192.168.1.50:9100'   # nas
        - '192.168.1.53:9100'   # pihole
        - '192.168.1.60:9100'   # monitoring-pi
        - '192.168.1.61:9100'   # vpn-pi

Start the stack:

docker compose up -d

Grafana is now at http://your-server:3000 (login: admin / changeme). Prometheus is at http://your-server:9090.

Installing Node Exporter on Target Machines

Node Exporter needs to run on every machine you want to monitor. There are several ways to install it.

Direct Install (Recommended for Bare Metal and VMs)

# Download the latest release
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar xzf node_exporter-1.8.2.linux-amd64.tar.gz

# Install the binary
sudo cp node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/
sudo chmod +x /usr/local/bin/node_exporter

# Create a system user
sudo useradd --no-create-home --shell /bin/false node_exporter

Create a systemd service at /etc/systemd/system/node_exporter.service:

[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
User=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Start it:

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter

Verify it's working:

curl http://localhost:9100/metrics | head -20

Docker Install (For Docker Hosts)

If the target machine runs Docker:

docker run -d --restart=unless-stopped \
  --name node-exporter \
  --net=host \
  --pid=host \
  -v /:/host:ro,rslave \
  prom/node-exporter:latest \
  --path.rootfs=/host

The --net=host and --pid=host flags are needed for accurate metrics. The root filesystem mount lets the exporter see disk usage.

Connecting Grafana to Prometheus

Open Grafana (http://your-server:3000)
Go to Connections > Data Sources > Add data source
Select Prometheus
Set the URL to http://prometheus:9090 (if on the same Docker network) or http://your-server-ip:9090
Click Save & Test

Building Dashboards

Import a Pre-Built Dashboard

Don't start from scratch. The community has excellent pre-built dashboards. Go to Dashboards > Import and enter a dashboard ID from grafana.com/grafana/dashboards:

Node Exporter Full (ID: 1860) — The most comprehensive node exporter dashboard. CPU, RAM, disk, network, filesystem, and more. This is the one most people use.
Node Exporter for Prometheus (ID: 11074) — A cleaner, more focused alternative.

Enter the ID, select your Prometheus data source, and click Import. You'll instantly have a full dashboard for all your monitored hosts.

Build a Custom Overview Dashboard

The imported dashboards show detailed per-host metrics. For a homelab overview, build a custom dashboard with panels showing:

CPU Usage Across All Hosts:

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory Usage Per Host:

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

Disk Usage Per Host:

(1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100

Network Traffic:

rate(node_network_receive_bytes_total{device!="lo"}[5m]) * 8

System Uptime:

node_time_seconds - node_boot_time_seconds

For each panel, set meaningful thresholds — green below 70%, yellow at 70-85%, red above 85%. Use the Stat panel type for single values and Time series for historical data.

What to Monitor

Not all metrics are equally useful. Here's what actually matters in a homelab:

Critical (Set Alerts for These)

Disk usage — The number one homelab killer. When a root partition fills up, things break in ugly ways. Alert at 85%.
RAM usage — Especially on ZFS hosts where ARC cache makes free memory reporting confusing. Alert at 90%.
Drive health — If you export SMART data (see below), alert on reallocated sectors or pending sectors.
Service availability — Is Pi-hole responding? Is your NAS reachable?

Important (Check Weekly)

CPU usage patterns — Sustained high CPU usually means a runaway process or misconfigured service.
Network throughput — Unusual spikes can indicate a misconfigured backup, a download gone wrong, or worse.
Temperatures — CPU and disk temperatures. Especially important in enclosed spaces or during summer.
Swap usage — Any swap activity means you're running out of RAM.

Nice to Have

System load averages — The 1/5/15 minute load gives you a feel for overall system health.
Disk I/O — Useful for identifying bottlenecks, especially on shared NAS storage.
Network errors — Non-zero error counts indicate bad cables or network interface issues.

Setting Up Alerts

Grafana can send alerts through email, Discord, Slack, Telegram, and many other channels. Go to Alerting > Contact Points to configure where alerts go.

Example: Disk Space Alert

Go to Alerting > Alert Rules > New Alert Rule
Query:

(1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100

Set condition: Is above 85
Evaluation interval: every 5 minutes
Choose your contact point

Example: Host Down Alert

up{job="node"} == 0

This fires when Prometheus can't reach a node exporter. Set the pending period to 2-3 minutes to avoid false alarms from brief network blips.

Discord Webhook (Popular for Homelabs)

In Discord, go to your server's channel settings > Integrations > Webhooks
Create a webhook and copy the URL
In Grafana, create a contact point with type Discord, paste the webhook URL
Test it

Now you'll get Discord notifications when your lab needs attention.

Additional Exporters

Node Exporter covers system metrics, but you can monitor much more:

cAdvisor (Container Metrics)

docker run -d --restart=unless-stopped \
  --name cadvisor \
  -p 8080:8080 \
  -v /:/rootfs:ro \
  -v /var/run:/var/run:ro \
  -v /sys:/sys:ro \
  -v /var/lib/docker/:/var/lib/docker:ro \
  gcr.io/cadvisor/cadvisor:latest

Add to prometheus.yml:

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['192.168.1.10:8080']

SNMP Exporter (Network Gear)

Monitor your router, switches, and access points if they support SNMP:

  - job_name: 'snmp'
    static_configs:
      - targets:
        - 192.168.1.1  # router
    metrics_path: /snmp
    params:
      module: [if_mib]
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: snmp-exporter:9116

Blackbox Exporter (Endpoint Monitoring)

Check if your services are actually responding:

  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - http://192.168.1.53/admin  # Pi-hole
        - http://192.168.1.80        # Nginx
        - http://192.168.1.50        # NAS
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox-exporter:9115

Storage and Retention

Prometheus stores data on disk. For a homelab with 5-10 hosts scraped every 15 seconds, expect:

~2-3 GB per month of storage
200-500 MB RAM for Prometheus

Set retention with the --storage.tsdb.retention.time flag. 90 days is a good default for homelabs — enough to spot trends without eating too much disk. If you need longer retention, look into Thanos or VictoriaMetrics as long-term storage backends.

Tips

Start with imported dashboards: Don't spend hours building dashboards before you understand what metrics matter to you. Use the community dashboards for a week, then customize.

Scrape interval of 15s is fine: Some people set it to 5s or even 1s. For a homelab, 15 seconds is plenty. Lower intervals increase storage and CPU usage without adding much value.

Label your instances: In prometheus.yml, you can add labels to make dashboard filtering easier:

- targets: ['192.168.1.50:9100']
  labels:
    hostname: 'nas'
    location: 'basement'

Use Grafana playlists for a status screen: If you have a spare monitor or tablet, set up a playlist that cycles through your dashboards. It's satisfying and genuinely useful.

Monitoring might seem like overkill for a homelab, but it's one of those things where once you have it, you can't imagine operating without it. The first time Grafana alerts you that a disk is filling up before it causes a problem, the setup time pays for itself.