Centralized Logging with Grafana Loki

Monitoring 2026-02-09 logging loki grafana monitoring

Logs are the first thing you reach for when something breaks. The problem in a homelab is that logs are scattered across a dozen machines and services — journalctl on this box, docker logs on that one, a config file on a third that tells you the app writes to /var/log/something-custom.log. Centralized logging collects all of them in one place, makes them searchable, and lets you set up alerts when bad things happen.

Grafana Loki is the logging system designed for this. Built by the Grafana Labs team, it takes a fundamentally different approach from the Elastic Stack (ELK): instead of indexing the full text of every log line, Loki indexes only metadata labels and stores the log content as compressed chunks. This makes it dramatically lighter on resources — perfect for homelabs where you don't have 32GB of RAM to dedicate to Elasticsearch.

Loki vs the Elastic Stack (ELK)

Before committing to Loki, let's compare it honestly with the Elastic Stack (Elasticsearch + Logstash + Kibana), which is the traditional choice for centralized logging.

Elastic Stack (ELK)

How it works: Elasticsearch indexes every word in every log line into an inverted index. This makes arbitrary full-text search blazingly fast — you can search for any substring across billions of log lines in seconds.

Resource cost: Elasticsearch is hungry. A basic setup needs 4-8GB of RAM minimum, and that grows quickly with log volume. Logstash (the log processing pipeline) adds another 1-2GB. Kibana needs its own resources. For a homelab with moderate log volume, you're looking at dedicating a server just to logging.

When to choose ELK: You have a lot of RAM to spare, you need fast arbitrary text search, or you're already familiar with the Elastic ecosystem.

Grafana Loki

How it works: Loki stores log lines as compressed chunks indexed only by label sets (like {job="nginx", host="web01"}). When you query, Loki finds the right chunks by labels, then searches through the log content within those chunks. It's like grep, but distributed and with label-based filtering to narrow the search space first.

Resource cost: Loki runs comfortably in 512MB-1GB of RAM for a homelab-scale deployment. It's designed to be cost-effective at all scales.

When to choose Loki: You're already using Grafana for metrics (Loki integrates natively), you want lightweight resource usage, or your log volume is moderate (which it is in most homelabs).

The trade-off: Loki is slower at arbitrary text search because it doesn't have a full-text index. Searching for a random string across all logs takes longer than Elasticsearch. But searching within a labeled stream (all logs from a specific service, for example) is fast. In practice, this matches how you actually debug things — you almost always know which service you're investigating.

Architecture

A Loki deployment has three components:

Loki — The log storage and query engine
Promtail — The agent that ships logs from machines to Loki
Grafana — The UI for querying and visualizing logs

Promtail runs on every machine whose logs you want to collect. It tails log files, collects journal entries, and scrapes Docker container logs, then ships them to Loki with labels attached. Grafana queries Loki and displays results.

[Machine 1: Promtail] ──→
[Machine 2: Promtail] ──→ [Loki] ←── [Grafana]
[Machine 3: Promtail] ──→

For small homelabs, you can run all three on the same machine. For larger setups, run Loki and Grafana on your monitoring server and Promtail on every other machine.

Installing Loki and Grafana

Docker Compose Deployment

# ~/docker/loki/docker-compose.yml
services:
  loki:
    image: grafana/loki:3.3.2
    container_name: loki
    restart: unless-stopped
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yml:/etc/loki/loki-config.yml
      - loki_data:/loki
    command: -config.file=/etc/loki/loki-config.yml

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      GF_SECURITY_ADMIN_PASSWORD: "changeme"

volumes:
  loki_data:
  grafana_data:

If you already have Grafana running (e.g., for Prometheus metrics), skip the Grafana service and just add Loki as a data source to your existing instance.

Loki Configuration

# ~/docker/loki/loki-config.yml
auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  filesystem:
    directory: /loki/chunks

limits_config:
  retention_period: 30d
  max_query_series: 5000
  max_query_parallelism: 2

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150

Key settings to understand:

retention_period: 30d — Logs older than 30 days are automatically deleted. Adjust based on your storage capacity. For most homelabs, 30-90 days is enough.
auth_enabled: false — No multi-tenancy. Fine for a homelab where you trust the network.
replication_factor: 1 — Single instance, no replication. Appropriate for homelab.

Start the stack:

docker compose up -d

Connect Grafana to Loki

Open Grafana at http://your-server:3000
Go to Connections > Data Sources > Add data source
Select Loki
Set the URL to http://loki:3100 (if on the same Docker network) or http://your-server-ip:3100
Click Save & Test

Setting Up Promtail

Promtail is the agent that collects and ships logs. Install it on every machine whose logs you want in Loki.

Docker Installation (For Docker Hosts)

If the machine runs Docker and you want to collect container logs:

# Add to your docker-compose.yml or create a new one
services:
  promtail:
    image: grafana/promtail:3.3.2
    container_name: promtail
    restart: unless-stopped
    volumes:
      - ./promtail-config.yml:/etc/promtail/promtail-config.yml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock
    command: -config.file=/etc/promtail/promtail-config.yml

Binary Installation (For VMs and Bare Metal)

# Download Promtail
wget https://github.com/grafana/loki/releases/download/v3.3.2/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
sudo mv promtail-linux-amd64 /usr/local/bin/promtail
sudo chmod +x /usr/local/bin/promtail

# Create config directory
sudo mkdir -p /etc/promtail

Create a systemd service:

# /etc/systemd/system/promtail.service
[Unit]
Description=Promtail Log Collector
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/promtail-config.yml
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable --now promtail

Promtail Configuration

# /etc/promtail/promtail-config.yml
server:
  http_listen_port: 9080

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki-server:3100/loki/api/v1/push

scrape_configs:
  # Collect systemd journal logs
  - job_name: journal
    journal:
      max_age: 12h
      labels:
        job: systemd-journal
    relabel_configs:
      - source_labels: ['__journal__systemd_unit']
        target_label: 'unit'
      - source_labels: ['__journal__hostname']
        target_label: 'hostname'
      - source_labels: ['__journal_syslog_identifier']
        target_label: 'syslog_identifier'
      - source_labels: ['__journal_priority_keyword']
        target_label: 'level'

  # Collect traditional log files
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          hostname: nas01
          __path__: /var/log/*.log

  # Collect auth logs
  - job_name: auth
    static_configs:
      - targets:
          - localhost
        labels:
          job: auth
          hostname: nas01
          __path__: /var/log/auth.log

Replace loki-server with your Loki server's IP address or hostname. Replace nas01 with the machine's actual hostname.

Collecting Docker Container Logs

Add a Docker scrape config to Promtail:

  # Collect Docker container logs
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      # Use the container name as a label
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'
      # Use the compose service name if available
      - source_labels: ['__meta_docker_container_label_com_docker_compose_service']
        target_label: 'compose_service'
      # Use the compose project name
      - source_labels: ['__meta_docker_container_label_com_docker_compose_project']
        target_label: 'compose_project'

This automatically discovers all running containers and ships their logs with labels for the container name, compose service, and compose project. When a new container starts, Promtail picks it up automatically.

Collecting systemd Journal Logs

The journal scrape config above handles this. Promtail reads from the systemd journal and extracts labels like the unit name, hostname, and priority level. This means you can query logs by service:

{unit="nginx.service"}

Or by priority:

{level="err"}

LogQL: Querying Your Logs

LogQL is Loki's query language. If you know PromQL (Prometheus's query language), LogQL will feel familiar. It starts with a label selector and optionally adds filters and transformations.

Basic Queries

# All logs from a specific host
{hostname="nas01"}

# All logs from nginx containers
{container="nginx"}

# All error-level journal entries
{job="systemd-journal", level="err"}

# Logs from a specific compose project
{compose_project="monitoring"}

Filtering Log Content

After the label selector, add pipe stages to filter:

# Lines containing "error" (case insensitive)
{container="traefik"} |~ "(?i)error"

# Lines NOT containing "healthcheck"
{container="nginx"} !~ "healthcheck"

# Lines containing a specific IP address
{job="auth"} |= "192.168.1.100"

# Combine filters
{unit="sshd.service"} |= "Failed password" !~ "invalid user"

Filter operators:

|= — Line contains string (exact match)
!= — Line does not contain string
|~ — Line matches regex
!~ — Line does not match regex

Parsing and Extracting Fields

LogQL can parse structured logs and extract fields:

# Parse JSON logs and filter by a field
{container="myapp"} | json | level="error"

# Parse logfmt logs
{container="traefik"} | logfmt | status >= 500

# Extract fields with regex
{job="auth"} | regexp `Failed password for (?P<user>\w+) from (?P<ip>[\d.]+)`
  | ip="192.168.1.100"

Metric Queries

LogQL can compute metrics from logs, which is useful for dashboards:

# Count of error logs per container over time
count_over_time({level="err"}[5m])

# Rate of 5xx errors in nginx
sum(rate({container="nginx"} |~ "HTTP/[12].\" 5\\d{2}" [5m]))

# Top 5 containers by log volume
topk(5, sum by (container) (rate({job="docker"}[5m])))

# Bytes rate — how much log data each service generates
sum by (container) (bytes_over_time({job="docker"}[1h]))

Building Grafana Dashboards for Logs

Log Explorer

The simplest way to start is Grafana's built-in Explore view:

Go to Explore
Select your Loki data source
Enter a LogQL query
Switch between Logs (raw lines), Table (parsed fields), and Graph (metric queries) views

Log Volume Dashboard

Create a dashboard panel showing log volume over time — spikes often correlate with problems:

sum by (hostname) (rate({job="systemd-journal"}[5m]))

Use the Time series visualization type. This shows you which machines are generating the most logs, and when.

Error Rate Dashboard

sum by (container) (rate({job="docker"} |~ "(?i)(error|fatal|panic)" [5m]))

A spike in errors from a specific container is an immediate signal that something needs attention.

SSH Authentication Dashboard

# Failed SSH attempts over time
sum(rate({unit="sshd.service"} |= "Failed password" [5m]))

# Successful logins
sum(rate({unit="sshd.service"} |= "Accepted" [5m]))

# Extract and count by source IP
sum by (ip) (count_over_time(
  {unit="sshd.service"} |= "Failed password"
  | regexp `from (?P<ip>[\d.]+)` [1h]
))

Retention Policies

Loki's retention is configured in loki-config.yml. The key settings:

limits_config:
  retention_period: 30d        # Global default

compactor:
  retention_enabled: true      # Must be explicitly enabled
  retention_delete_delay: 2h   # Delay before deleting expired chunks

You can also set per-stream retention using overrides, but for a homelab, a global retention period is usually sufficient.

Estimating Storage Needs

Loki compresses logs aggressively. As a rough guide:

Light homelab (5 machines, basic services): 50-200 MB/day
Medium homelab (10 machines, Docker hosts, busy services): 200 MB - 1 GB/day
Heavy homelab (many machines, verbose logging, security monitoring): 1-5 GB/day

At 30 days retention, a medium homelab needs 6-30 GB of storage for Loki. This is dramatically less than Elasticsearch would need for the same log volume.

Monitor your actual usage:

# Total ingested bytes in the last 24 hours
sum(bytes_over_time({job=~".+"}[24h]))

Setting Up Alerts

Loki supports alerting through Grafana or its own built-in ruler. Grafana alerting is more flexible and easier to configure.

Alert: High Error Rate

Go to Alerting > Alert Rules > New Alert Rule
Query:

sum(count_over_time({level="err"}[5m]))

Condition: Is above 50 (adjust threshold to your baseline)
Evaluation: Every 1 minute, for 5 minutes
Choose your contact point (Discord, email, etc.)

Alert: SSH Brute Force Detection

sum(count_over_time({unit="sshd.service"} |= "Failed password" [5m]))

Alert if this exceeds 10 in 5 minutes. This catches brute force attempts on any monitored host.

Alert: Container Crash Loop

count_over_time({job="docker"} |~ "(?i)(panic|fatal|killed|oom)" [10m])

Alert if any container is logging panic/fatal/OOM messages, which usually indicates a crash loop.

Alert: Disk Full Warnings from Logs

{job="systemd-journal"} |~ "(?i)(no space left|disk full|filesystem.*full)"

Sometimes the first sign of a full disk is an error message in the logs. Catch it here even before your Prometheus disk usage alert fires.

Tips for a Clean Logging Setup

Label discipline matters. Keep your label set small and consistent. Labels like hostname, job, container, and level cover most needs. Don't add high-cardinality labels (like user IDs or request IDs) — they create too many streams and hurt Loki's performance.

Filter noisy logs. Some services (looking at you, Docker health checks and Traefik access logs) generate enormous volumes of repetitive log lines. Use Promtail's pipeline stages to drop them before they reach Loki:

  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
    pipeline_stages:
      # Drop health check logs
      - match:
          selector: '{container="traefik"}'
          stages:
            - regex:
                expression: '.*healthcheck.*'
            - drop:
                source: ''
                expression: '.*healthcheck.*'

Use structured logging in your own apps. If you're writing services that run in your homelab, log in JSON format. Loki's | json parser makes structured logs much more useful than plain text.

Start with Explore, then build dashboards. Don't build dashboards until you've spent time in Grafana's Explore view understanding your log patterns. You'll learn what queries are useful and what's noise.

Pair with Prometheus. Loki handles logs. Prometheus handles metrics. Together with Grafana, they give you complete observability. When a Prometheus alert fires for high CPU, you can jump to Loki to see what the service was logging at that time. Grafana makes this correlation seamless — click on a spike in a metric graph and it takes you to the logs for that time range.

Centralized logging transforms how you operate your homelab. Instead of SSH-ing into machines and grepping through log files, you query everything from one Grafana dashboard. The first time you debug a multi-service issue by correlating logs from three different machines in one view, you'll wonder how you ever managed without it.