Centralized Logging with Grafana Loki
Logs are the first thing you reach for when something breaks. The problem in a homelab is that logs are scattered across a dozen machines and services — journalctl on this box, docker logs on that one, a config file on a third that tells you the app writes to /var/log/something-custom.log. Centralized logging collects all of them in one place, makes them searchable, and lets you set up alerts when bad things happen.
Grafana Loki is the logging system designed for this. Built by the Grafana Labs team, it takes a fundamentally different approach from the Elastic Stack (ELK): instead of indexing the full text of every log line, Loki indexes only metadata labels and stores the log content as compressed chunks. This makes it dramatically lighter on resources — perfect for homelabs where you don't have 32GB of RAM to dedicate to Elasticsearch.
Loki vs the Elastic Stack (ELK)
Before committing to Loki, let's compare it honestly with the Elastic Stack (Elasticsearch + Logstash + Kibana), which is the traditional choice for centralized logging.
Elastic Stack (ELK)
How it works: Elasticsearch indexes every word in every log line into an inverted index. This makes arbitrary full-text search blazingly fast — you can search for any substring across billions of log lines in seconds.
Resource cost: Elasticsearch is hungry. A basic setup needs 4-8GB of RAM minimum, and that grows quickly with log volume. Logstash (the log processing pipeline) adds another 1-2GB. Kibana needs its own resources. For a homelab with moderate log volume, you're looking at dedicating a server just to logging.
When to choose ELK: You have a lot of RAM to spare, you need fast arbitrary text search, or you're already familiar with the Elastic ecosystem.
Grafana Loki
How it works: Loki stores log lines as compressed chunks indexed only by label sets (like {job="nginx", host="web01"}). When you query, Loki finds the right chunks by labels, then searches through the log content within those chunks. It's like grep, but distributed and with label-based filtering to narrow the search space first.
Resource cost: Loki runs comfortably in 512MB-1GB of RAM for a homelab-scale deployment. It's designed to be cost-effective at all scales.
When to choose Loki: You're already using Grafana for metrics (Loki integrates natively), you want lightweight resource usage, or your log volume is moderate (which it is in most homelabs).
The trade-off: Loki is slower at arbitrary text search because it doesn't have a full-text index. Searching for a random string across all logs takes longer than Elasticsearch. But searching within a labeled stream (all logs from a specific service, for example) is fast. In practice, this matches how you actually debug things — you almost always know which service you're investigating.
Architecture
A Loki deployment has three components:
- Loki — The log storage and query engine
- Promtail — The agent that ships logs from machines to Loki
- Grafana — The UI for querying and visualizing logs
Promtail runs on every machine whose logs you want to collect. It tails log files, collects journal entries, and scrapes Docker container logs, then ships them to Loki with labels attached. Grafana queries Loki and displays results.
[Machine 1: Promtail] ──→
[Machine 2: Promtail] ──→ [Loki] ←── [Grafana]
[Machine 3: Promtail] ──→
For small homelabs, you can run all three on the same machine. For larger setups, run Loki and Grafana on your monitoring server and Promtail on every other machine.
Installing Loki and Grafana
Docker Compose Deployment
# ~/docker/loki/docker-compose.yml
services:
loki:
image: grafana/loki:3.3.2
container_name: loki
restart: unless-stopped
ports:
- "3100:3100"
volumes:
- ./loki-config.yml:/etc/loki/loki-config.yml
- loki_data:/loki
command: -config.file=/etc/loki/loki-config.yml
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
GF_SECURITY_ADMIN_PASSWORD: "changeme"
volumes:
loki_data:
grafana_data:
If you already have Grafana running (e.g., for Prometheus metrics), skip the Grafana service and just add Loki as a data source to your existing instance.
Loki Configuration
# ~/docker/loki/loki-config.yml
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
filesystem:
directory: /loki/chunks
limits_config:
retention_period: 30d
max_query_series: 5000
max_query_parallelism: 2
compactor:
working_directory: /loki/compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
Key settings to understand:
retention_period: 30d— Logs older than 30 days are automatically deleted. Adjust based on your storage capacity. For most homelabs, 30-90 days is enough.auth_enabled: false— No multi-tenancy. Fine for a homelab where you trust the network.replication_factor: 1— Single instance, no replication. Appropriate for homelab.
Start the stack:
docker compose up -d
Connect Grafana to Loki
- Open Grafana at
http://your-server:3000 - Go to Connections > Data Sources > Add data source
- Select Loki
- Set the URL to
http://loki:3100(if on the same Docker network) orhttp://your-server-ip:3100 - Click Save & Test
Setting Up Promtail
Promtail is the agent that collects and ships logs. Install it on every machine whose logs you want in Loki.
Docker Installation (For Docker Hosts)
If the machine runs Docker and you want to collect container logs:
# Add to your docker-compose.yml or create a new one
services:
promtail:
image: grafana/promtail:3.3.2
container_name: promtail
restart: unless-stopped
volumes:
- ./promtail-config.yml:/etc/promtail/promtail-config.yml
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock
command: -config.file=/etc/promtail/promtail-config.yml
Binary Installation (For VMs and Bare Metal)
# Download Promtail
wget https://github.com/grafana/loki/releases/download/v3.3.2/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
sudo mv promtail-linux-amd64 /usr/local/bin/promtail
sudo chmod +x /usr/local/bin/promtail
# Create config directory
sudo mkdir -p /etc/promtail
Create a systemd service:
# /etc/systemd/system/promtail.service
[Unit]
Description=Promtail Log Collector
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/promtail-config.yml
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now promtail
Promtail Configuration
# /etc/promtail/promtail-config.yml
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki-server:3100/loki/api/v1/push
scrape_configs:
# Collect systemd journal logs
- job_name: journal
journal:
max_age: 12h
labels:
job: systemd-journal
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: 'unit'
- source_labels: ['__journal__hostname']
target_label: 'hostname'
- source_labels: ['__journal_syslog_identifier']
target_label: 'syslog_identifier'
- source_labels: ['__journal_priority_keyword']
target_label: 'level'
# Collect traditional log files
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
hostname: nas01
__path__: /var/log/*.log
# Collect auth logs
- job_name: auth
static_configs:
- targets:
- localhost
labels:
job: auth
hostname: nas01
__path__: /var/log/auth.log
Replace loki-server with your Loki server's IP address or hostname. Replace nas01 with the machine's actual hostname.
Collecting Docker Container Logs
Add a Docker scrape config to Promtail:
# Collect Docker container logs
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
# Use the container name as a label
- source_labels: ['__meta_docker_container_name']
regex: '/(.*)'
target_label: 'container'
# Use the compose service name if available
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
target_label: 'compose_service'
# Use the compose project name
- source_labels: ['__meta_docker_container_label_com_docker_compose_project']
target_label: 'compose_project'
This automatically discovers all running containers and ships their logs with labels for the container name, compose service, and compose project. When a new container starts, Promtail picks it up automatically.
Collecting systemd Journal Logs
The journal scrape config above handles this. Promtail reads from the systemd journal and extracts labels like the unit name, hostname, and priority level. This means you can query logs by service:
{unit="nginx.service"}
Or by priority:
{level="err"}
LogQL: Querying Your Logs
LogQL is Loki's query language. If you know PromQL (Prometheus's query language), LogQL will feel familiar. It starts with a label selector and optionally adds filters and transformations.
Basic Queries
# All logs from a specific host
{hostname="nas01"}
# All logs from nginx containers
{container="nginx"}
# All error-level journal entries
{job="systemd-journal", level="err"}
# Logs from a specific compose project
{compose_project="monitoring"}
Filtering Log Content
After the label selector, add pipe stages to filter:
# Lines containing "error" (case insensitive)
{container="traefik"} |~ "(?i)error"
# Lines NOT containing "healthcheck"
{container="nginx"} !~ "healthcheck"
# Lines containing a specific IP address
{job="auth"} |= "192.168.1.100"
# Combine filters
{unit="sshd.service"} |= "Failed password" !~ "invalid user"
Filter operators:
|=— Line contains string (exact match)!=— Line does not contain string|~— Line matches regex!~— Line does not match regex
Parsing and Extracting Fields
LogQL can parse structured logs and extract fields:
# Parse JSON logs and filter by a field
{container="myapp"} | json | level="error"
# Parse logfmt logs
{container="traefik"} | logfmt | status >= 500
# Extract fields with regex
{job="auth"} | regexp `Failed password for (?P<user>\w+) from (?P<ip>[\d.]+)`
| ip="192.168.1.100"
Metric Queries
LogQL can compute metrics from logs, which is useful for dashboards:
# Count of error logs per container over time
count_over_time({level="err"}[5m])
# Rate of 5xx errors in nginx
sum(rate({container="nginx"} |~ "HTTP/[12].\" 5\\d{2}" [5m]))
# Top 5 containers by log volume
topk(5, sum by (container) (rate({job="docker"}[5m])))
# Bytes rate — how much log data each service generates
sum by (container) (bytes_over_time({job="docker"}[1h]))
Building Grafana Dashboards for Logs
Log Explorer
The simplest way to start is Grafana's built-in Explore view:
- Go to Explore
- Select your Loki data source
- Enter a LogQL query
- Switch between Logs (raw lines), Table (parsed fields), and Graph (metric queries) views
Log Volume Dashboard
Create a dashboard panel showing log volume over time — spikes often correlate with problems:
sum by (hostname) (rate({job="systemd-journal"}[5m]))
Use the Time series visualization type. This shows you which machines are generating the most logs, and when.
Error Rate Dashboard
sum by (container) (rate({job="docker"} |~ "(?i)(error|fatal|panic)" [5m]))
A spike in errors from a specific container is an immediate signal that something needs attention.
SSH Authentication Dashboard
# Failed SSH attempts over time
sum(rate({unit="sshd.service"} |= "Failed password" [5m]))
# Successful logins
sum(rate({unit="sshd.service"} |= "Accepted" [5m]))
# Extract and count by source IP
sum by (ip) (count_over_time(
{unit="sshd.service"} |= "Failed password"
| regexp `from (?P<ip>[\d.]+)` [1h]
))
Retention Policies
Loki's retention is configured in loki-config.yml. The key settings:
limits_config:
retention_period: 30d # Global default
compactor:
retention_enabled: true # Must be explicitly enabled
retention_delete_delay: 2h # Delay before deleting expired chunks
You can also set per-stream retention using overrides, but for a homelab, a global retention period is usually sufficient.
Estimating Storage Needs
Loki compresses logs aggressively. As a rough guide:
- Light homelab (5 machines, basic services): 50-200 MB/day
- Medium homelab (10 machines, Docker hosts, busy services): 200 MB - 1 GB/day
- Heavy homelab (many machines, verbose logging, security monitoring): 1-5 GB/day
At 30 days retention, a medium homelab needs 6-30 GB of storage for Loki. This is dramatically less than Elasticsearch would need for the same log volume.
Monitor your actual usage:
# Total ingested bytes in the last 24 hours
sum(bytes_over_time({job=~".+"}[24h]))
Setting Up Alerts
Loki supports alerting through Grafana or its own built-in ruler. Grafana alerting is more flexible and easier to configure.
Alert: High Error Rate
- Go to Alerting > Alert Rules > New Alert Rule
- Query:
sum(count_over_time({level="err"}[5m]))
- Condition: Is above 50 (adjust threshold to your baseline)
- Evaluation: Every 1 minute, for 5 minutes
- Choose your contact point (Discord, email, etc.)
Alert: SSH Brute Force Detection
sum(count_over_time({unit="sshd.service"} |= "Failed password" [5m]))
Alert if this exceeds 10 in 5 minutes. This catches brute force attempts on any monitored host.
Alert: Container Crash Loop
count_over_time({job="docker"} |~ "(?i)(panic|fatal|killed|oom)" [10m])
Alert if any container is logging panic/fatal/OOM messages, which usually indicates a crash loop.
Alert: Disk Full Warnings from Logs
{job="systemd-journal"} |~ "(?i)(no space left|disk full|filesystem.*full)"
Sometimes the first sign of a full disk is an error message in the logs. Catch it here even before your Prometheus disk usage alert fires.
Tips for a Clean Logging Setup
Label discipline matters. Keep your label set small and consistent. Labels like hostname, job, container, and level cover most needs. Don't add high-cardinality labels (like user IDs or request IDs) — they create too many streams and hurt Loki's performance.
Filter noisy logs. Some services (looking at you, Docker health checks and Traefik access logs) generate enormous volumes of repetitive log lines. Use Promtail's pipeline stages to drop them before they reach Loki:
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
pipeline_stages:
# Drop health check logs
- match:
selector: '{container="traefik"}'
stages:
- regex:
expression: '.*healthcheck.*'
- drop:
source: ''
expression: '.*healthcheck.*'
Use structured logging in your own apps. If you're writing services that run in your homelab, log in JSON format. Loki's | json parser makes structured logs much more useful than plain text.
Start with Explore, then build dashboards. Don't build dashboards until you've spent time in Grafana's Explore view understanding your log patterns. You'll learn what queries are useful and what's noise.
Pair with Prometheus. Loki handles logs. Prometheus handles metrics. Together with Grafana, they give you complete observability. When a Prometheus alert fires for high CPU, you can jump to Loki to see what the service was logging at that time. Grafana makes this correlation seamless — click on a spike in a metric graph and it takes you to the logs for that time range.
Centralized logging transforms how you operate your homelab. Instead of SSH-ing into machines and grepping through log files, you query everything from one Grafana dashboard. The first time you debug a multi-service issue by correlating logs from three different machines in one view, you'll wonder how you ever managed without it.