Hyperconverged Infrastructure in Your Home Lab: Proxmox + Ceph

Virtualization 2026-02-09 hyperconverged ceph proxmox clustering

Hyperconverged infrastructure (HCI) is one of those enterprise concepts that sounds way too serious for a home lab. Three or more nodes, shared distributed storage, automatic failover — this is data center stuff. But Proxmox makes it genuinely accessible, and if you have the hardware, it's one of the most rewarding builds you can do.

The question isn't whether HCI is cool. It is. The question is whether it makes sense for your situation, because it's a significant step up in complexity, cost, and power consumption from a single-node setup.

What Hyperconverged Actually Means

In a traditional setup, you have separate compute servers and separate storage systems. Your VMs run on the compute nodes and store their data on a SAN or NAS over the network. Two different systems, two different management planes.

Hyperconverged infrastructure combines compute and storage on the same nodes. Every node runs VMs and contributes its disks to a shared storage pool. The storage is distributed and replicated across all nodes, so if one node dies, the data is still available on the others and VMs can restart elsewhere.

In the Proxmox world, this means:

Compute: Proxmox VE runs KVM virtual machines and LXC containers on each node
Storage: Ceph runs on the same nodes, pooling local disks into a distributed, replicated storage cluster
Networking: A dedicated network connects the nodes for both Ceph replication traffic and VM live migration

The result: a cluster where you can lose an entire node — pull the power cord — and your VMs come back up on the surviving nodes with no data loss. That's the pitch.

Minimum Hardware Requirements

HCI has a hard floor: three nodes. Ceph needs a minimum of three nodes to maintain data availability when one fails (it uses a quorum model, and with replication factor 3, losing one node still leaves two good copies). You can technically run Ceph on fewer nodes, but you lose the fault tolerance that's the whole point.

Per Node (Minimum Viable)

Component	Minimum	Recommended
CPU	4 cores / 8 threads	8+ cores / 16+ threads
RAM	32 GB	64 GB+
Boot drive	1x 256 GB SSD	1x 512 GB NVMe
Ceph OSD drives	2x HDD or SSD	3-4x SSD or mix
Ceph WAL/DB	(optional)	1x NVMe for WAL/DB
Network	1 Gbps (minimum 2 NICs)	10 Gbps dedicated Ceph network

Why So Much RAM?

Ceph is hungry. Each OSD (Object Storage Daemon — one per disk) wants 4-8 GB of RAM by default. With 3 OSDs per node, that's 12-24 GB just for Ceph. Your VMs need the rest. 32 GB is tight. 64 GB is comfortable. 128 GB lets you breathe.

Three-Node Cluster Total

For a budget build with used enterprise hardware:

Item	Per Node	3x Total
Used Dell R730xd / HP DL380 Gen9	$200-400	$600-1,200
64 GB RAM (upgrade if needed)	$50-100	$150-300
3x 1 TB SATA SSD (Ceph)	$150-250	$450-750
1x 256 GB NVMe (boot)	$30	$90
10 GbE NIC (Mellanox ConnectX-3)	$15-25	$45-75
10 GbE switch (used)	—	$50-150
Total		$1,385-2,565

That's a real range. You can build a functional three-node HCI cluster for under $1,500 if you're patient with used hardware. But you can also spend $3,000+ easily if you go with newer gear or more storage.

Network Requirements

Networking is where HCI builds succeed or fail. Ceph replicates every write to multiple nodes across the network. If your network is slow, your storage is slow. Period.

The Minimum: Two Networks

You need at least two separate networks:

Management/Public network: Carries regular traffic — VM network access, Proxmox web UI, API calls. Your normal LAN.
Ceph/Cluster network: Carries Ceph replication traffic and OSD heartbeats. This should be separate and ideally faster.

                  ┌─────────────────┐
                  │   LAN Switch    │  (1 Gbps management)
                  └─┬─────┬─────┬──┘
                    │     │     │
               ┌────┴┐ ┌─┴───┐ ┌┴────┐
               │Node1│ │Node2│ │Node3│
               └────┬┘ └─┬───┘ └┬────┘
                    │     │     │
                  ┌─┴─────┴─────┴──┐
                  │  Ceph Switch    │  (10 Gbps storage)
                  └─────────────────┘

1 Gbps: It Works, Barely

You can run Ceph over 1 Gbps. It functions. But your maximum sequential write throughput is capped at around 100 MB/s, and with replication factor 3, every write generates 3x the network traffic. You'll hit the network ceiling before your SSDs break a sweat.

For HDDs, 1 Gbps is usually fine — the disks are the bottleneck, not the network. For SSDs, it's painful.

10 Gbps: The Sweet Spot

10 GbE is where Ceph starts to feel right. Used Mellanox ConnectX-3 cards are $15-25 each on eBay, and a used 10 GbE switch (Mikrotik CRS305 or similar) runs $50-150. For the price of a pizza dinner, you've removed your biggest bottleneck.

# Verify 10 GbE link speed
ethtool enp4s0 | grep Speed
# Speed: 10000Mb/s

25/40 Gbps: Overkill (But Fun)

ConnectX-4 and ConnectX-5 cards for 25 GbE are getting cheap on the used market. If you're running all-NVMe Ceph, the extra bandwidth helps. For most home labs, 10 GbE is plenty.

Setting Up the Proxmox Cluster

Step 1: Install Proxmox on All Three Nodes

Install Proxmox VE on each node from the ISO. Use the NVMe or SSD as the boot drive. During installation:

Set a unique hostname for each node (pve1, pve2, pve3)
Assign management IPs on the same subnet (e.g., 192.168.1.101-103)
Use the same DNS settings on all nodes

Step 2: Configure Networking

On each node, set up the Ceph network. Assuming your second NIC is enp4s0:

# /etc/network/interfaces (add to existing config)

auto enp4s0
iface enp4s0 inet static
    address 10.10.10.101/24   # .101 for node 1, .102 for node 2, etc.
    mtu 9000                   # Jumbo frames for better throughput

# Apply and verify
ifreload -a
ping -M do -s 8972 10.10.10.102   # Test jumbo frames between nodes

Enable jumbo frames (MTU 9000) on the Ceph network and the switch. This reduces CPU overhead and improves throughput. Make sure the switch supports jumbo frames and has them enabled on the relevant ports.

Step 3: Create the Cluster

On the first node (pve1):

pvecm create homelab-cluster

On the second and third nodes:

# On pve2
pvecm add 192.168.1.101

# On pve3
pvecm add 192.168.1.101

Verify:

pvecm status
# Should show 3 nodes, quorate

Step 4: Install Ceph

Proxmox has Ceph integration built in. From the web UI or command line on each node:

pveceph install --repository no-subscription

Step 5: Initialize Ceph

On the first node:

pveceph init --network 10.10.10.0/24

This tells Ceph to use the 10.10.10.0/24 network for cluster/replication traffic, keeping it off your management LAN.

Step 6: Create Monitors and Managers

Ceph needs monitors (MONs) for cluster state and managers (MGRs) for metrics. Create one of each on every node:

# On each node (or from the web UI)
pveceph mon create
pveceph mgr create

Step 7: Create OSDs

An OSD (Object Storage Daemon) manages one physical disk. Create one OSD per data disk on each node:

# On each node, for each disk
pveceph osd create /dev/sda
pveceph osd create /dev/sdb
pveceph osd create /dev/sdc

If you have a fast NVMe you want to use as a WAL/DB device (write-ahead log and metadata database), which dramatically improves performance for HDD-backed OSDs:

pveceph osd create /dev/sda --db_dev /dev/nvme0n1
pveceph osd create /dev/sdb --db_dev /dev/nvme0n1
pveceph osd create /dev/sdc --db_dev /dev/nvme0n1

Step 8: Create a Storage Pool

# Create a replicated pool with size 3 (data on 3 nodes) and min_size 2 (survives 1 failure)
pveceph pool create vm-storage --size 3 --min_size 2 --pg_autoscale_mode on

This pool is now available in Proxmox as a storage target for VM disks and container volumes.

Storage Tiering

Not all data needs the same performance. Ceph supports multiple pools with different backing hardware, which lets you tier your storage.

Example Tiering Strategy

Tier	Hardware	Use Case	Approximate Cost/TB
Fast	NVMe SSDs	Database VMs, active workloads	$80-120/TB
Standard	SATA SSDs	General VM storage, containers	$50-80/TB
Cold	HDDs with NVMe WAL/DB	Media, backups, archives	$20-30/TB

# Create CRUSH rules to separate tiers
# (This uses device classes — Ceph auto-detects SSD vs HDD)

# Pool using only SSDs
pveceph pool create fast-storage --size 3 --min_size 2 --crush_rule ssd

# Pool using only HDDs
pveceph pool create bulk-storage --size 3 --min_size 2 --crush_rule hdd

In practice, most home labs use a single tier. Mixing tiers adds complexity, and the benefit only matters if you have genuinely different performance requirements. If all your disks are SSDs, just make one pool.

When HCI Makes Sense (And When It Doesn't)

HCI Makes Sense When

You need high availability: You run services that genuinely can't tolerate downtime, and you want automatic failover.
You already have multiple servers: If you have three machines sitting around, clustering them is a great use of existing hardware.
You want to learn enterprise concepts: HCI, distributed storage, clustering, and failover are valuable skills. Building it at home is the best way to learn.
You're consolidating: Three old machines might be better as a cluster than three standalone boxes.

HCI Does NOT Make Sense When

You have one server: A single Proxmox node with local ZFS storage will outperform a three-node Ceph cluster in raw I/O every time. Ceph adds latency (network round trips for replication). If you don't need multi-node failover, don't add the complexity.
Your budget is tight: $1,500+ for hardware, plus ongoing power costs for three machines. A single well-specced node costs $500-800 and does 90% of what most home labs need.
Your power bill matters: Three servers running 24/7 consume 300-600 watts. A single server uses 80-150 watts. That's an extra $150-400/year in electricity depending on your rates.
You want simplicity: A single Proxmox node with ZFS is dramatically simpler to manage. No quorum issues, no Ceph degraded states, no cluster networking. It just works.

Honest Performance Comparison

Metric	Single Node (ZFS)	3-Node HCI (Ceph)
Sequential read	500+ MB/s (local)	200-400 MB/s (network)
Sequential write	400+ MB/s (local)	100-300 MB/s (network)
Random 4K IOPS	50,000+ (NVMe)	10,000-30,000
Latency	<1 ms (local)	1-5 ms (network)
Fault tolerance	None (unless ZFS mirror)	Survives node failure
Capacity scaling	Limited by chassis	Add nodes

Ceph is slower than local storage. Always. The network adds latency to every I/O operation. You build HCI for resilience and scalability, not raw performance.

Operating Your Cluster

Monitoring Health

# Overall cluster status
pveceph status

# Detailed OSD status
ceph osd tree

# Check for degraded objects
ceph health detail

# Watch cluster events in real time
ceph -w

Ceph will tell you when things are wrong. A healthy cluster shows HEALTH_OK. Common warnings:

HEALTH_WARN: Something needs attention but data is safe
HEALTH_ERR: Data redundancy is compromised, fix immediately

Handling Node Failures

When a node goes down:

Ceph detects the missing OSDs (within ~30 seconds)
The cluster marks the OSDs as down
After a timeout (default 10 minutes), it starts rebalancing data to maintain replication factor
Proxmox HA (if configured) restarts affected VMs on surviving nodes

When the node comes back:

OSDs rejoin the cluster
Ceph rebalances data back (this is called "backfilling")
Everything returns to normal

Maintenance: Taking a Node Offline

Before rebooting or maintaining a node, tell Ceph to not panic:

# Set noout flag (prevents Ceph from rebalancing during maintenance)
ceph osd set noout

# Do your maintenance (reboot, upgrade, etc.)
reboot

# When the node is back, unset the flag
ceph osd unset noout

This prevents unnecessary data movement during planned maintenance.

Cost Analysis: Is It Worth It?

Let's be brutally honest with the numbers.

Three-Node HCI Cluster

Cost	Amount
Hardware (used enterprise)	$1,500-2,500
10 GbE switch + NICs	$100-200
Power (300W avg, $0.12/kWh)	~$315/year
Total first year	$1,900-3,000
Ongoing annual (power)	~$315/year

Single Node Alternative

Cost	Amount
Hardware (used enterprise or new mini PC)	$400-800
Power (100W avg, $0.12/kWh)	~$105/year
Total first year	$500-900
Ongoing annual (power)	~$105/year

The HCI cluster costs 3-4x more in hardware and 3x more in power. You get fault tolerance and the ability to do live migrations, rolling upgrades, and capacity scaling. Whether that's worth it depends entirely on whether you need those things or just want to learn about them.

If the answer is "I want to learn" — that's a perfectly valid reason. Building and operating a Ceph cluster teaches you more about distributed systems than any course or certification. Just go in knowing it's an investment in education, not a practical necessity for running Pi-hole and Jellyfin.

Final Thoughts

Hyperconverged infrastructure in a home lab is a step into real infrastructure engineering. Proxmox and Ceph make it accessible — no VMware licenses, no proprietary hardware, no vendor lock-in. But accessible doesn't mean simple. You're running a distributed storage system, and distributed systems have failure modes that single-node setups don't.

Start with the basics: three nodes, a dedicated Ceph network (10 GbE if you can), all-SSD storage. Get the cluster healthy, create some VMs, practice failover. Then add complexity — tiered storage, HA groups, Proxmox Backup Server integration.

And if you read all of this and think "that's way more than I need" — that's a valid conclusion. A single Proxmox node with ZFS and good backups handles most home lab workloads beautifully. HCI is there when you're ready for it.