Skip to main content
BlogEngineering

Solo Dev DevOps: My Proxmox + Coolify Homelab Setup

The exact hardware, software, and operational rituals that let one engineer run four production SaaS without an SRE team.

Solo Dev DevOps: My Proxmox + Coolify Homelab Setup

The studio's production stack runs on a single Proxmox host that lives 6 feet from where I am writing this post. It hosts four SaaS in production, multiple development environments, an analytics platform, a private DNS resolver, a Gitea mirror, and a couple of LLM experiments. The whole thing pulls maybe 90 watts at idle. The electricity bill last month was 22 EUR. Total infrastructure cost (amortized hardware + electricity) is roughly 60 EUR a month. Same workload on a managed cloud setup would be roughly 850 EUR per month. The solo developer homelab Proxmox Coolify setup is not for everyone, but for a one-person studio it is the difference between "I can afford to run four products" and "I can afford to run one".

What runs on it

To make the rest of the post concrete, the host runs:

  • Carriva production VM (VM 106), the retirement-advisory SaaS at carriva.fr.
  • Three CTs for the lesson-planning monorepo (one each for PMC, DML, PL, with Creaclases pre-prod also living here).
  • A Postgres CT cluster.
  • A MinIO instance for asset storage.
  • Coolify managing most app deploys.
  • AdGuard for internal DNS.
  • A Cloudflare Tunnel container exposing public services.
  • Self-hosted Gitea (mirrored to a copy on the QNAP NAS).
  • Umami self-hosted (we moved here from PostHog Hobby, the old container is decommissioned).
  • A few LLM experiment VMs that get started on demand.

Roughly 14 active services at any given time. The Proxmox host is the substrate. Coolify is the application layer. Cloudflare Tunnel is the door to the internet.

The hardware

We deliberately picked unsexy hardware because the goal is reliability, not benchmarks. The current build:

  • Motherboard: ASUS TUF B650M-PLUS.
  • CPU: 12-core AMD Ryzen on AM5.
  • RAM: 64 GB ECC.
  • Storage: 2 TB NVMe primary, 4 TB SATA SSD for bulk, 8 TB external HDD on the NAS for cold backup.
  • Network: 2.5 GbE on the motherboard, paired with a 2.5 GbE switch and the office's gigabit fiber.
  • Power: a regular consumer PSU, no UPS yet (this is on the to-do list and frankly should have been done already).

The build cost was around 1,800 EUR. Amortized over four years that is 37.50 EUR per month. Plus electricity at roughly 22 EUR per month and we are at 60 EUR per month all-in for the entire studio infrastructure.

Two practical notes:

  • We use ECC RAM because Postgres lives here. ECC catches the kind of memory corruption that silently corrupts databases. The cost premium is small. The peace of mind is large.
  • We do fan control via fancontrol because the TUF board's stock fan curves are loud. We also have a specific OOM trap configured for the GPU passthrough VM (VM 200) because in one specific case it was orphaning the dGPU on vfio-pci and we needed to recover automatically.

The Proxmox layer

Proxmox is a hypervisor that runs both VMs and LXC containers. We use both, deliberately:

  • LXC containers for things that are mostly stateless or that benefit from running closer to the host kernel: the lesson-planning apps, the Postgres cluster, MinIO, Coolify itself, Umami, AdGuard.
  • VMs for things that need full isolation, a different kernel, or hardware passthrough: Carriva (which we treat as more isolated for security reasons), and the LLM experiment VMs (which get GPU passthrough when needed).

The split is mostly about isolation tradeoffs. LXC is faster and cheaper. VMs are more isolated. We pick per service.

Proxmox itself we treat as sacred. We never install random tools on the host. Every service runs in a CT or VM. The host's job is to be a reliable hypervisor and nothing else. We have written about why we self-host Postgres and the same discipline applies to the entire homelab.

Treat the Proxmox host like a kernel. Touch it as little as possible.

Coolify: the deploy layer

Coolify is the app deployment layer for most of the studio. It does roughly what a managed PaaS would do, except it runs on our hardware. Push to a Git branch, Coolify builds the Docker image, runs the container, manages logs and rollbacks.

We use Coolify for:

  • The studio website (draftedby.com).
  • The lesson-planning monorepo deploys.
  • Most ancillary services (Umami, Gitea, internal tools).
  • jdchess.com, a side project of mine that lives here.

We do NOT use Coolify for Carriva. Carriva uses a deliberate SSH + deploy.sh script. The reason is that Carriva is the most production-critical product and we wanted full control over the deploy pipeline. Coolify is great. The escape hatch matters when something breaks at 11pm.

The recurring failure mode with Coolify is DNS. Our Docker daemon points only to AdGuard for DNS, with no fallback. Most of the time this is fine. Occasionally a DNS hiccup (AdGuard restart, network blip) breaks builds with errors like EAI_AGAIN, ENETUNREACH, or getaddrinfo ENOTFOUND against npm, Google Fonts, or Resend. The fix is always the same: make sure AdGuard is up, retry the build. We have a written skill for this exact scenario because it happens enough to deserve a runbook.

Cloudflare Tunnel: the public-internet door

Every public service is exposed via Cloudflare Tunnel. No port forwarding on the home router. No public IP exposed to the internet. The tunnel runs as a container, authenticates to Cloudflare with a credential, and proxies traffic to the appropriate internal service based on hostname.

Why this approach:

  • No public IP exposure. The home router's WAN-side ports are all closed.
  • TLS handled by Cloudflare. We do not deal with cert renewal at the host level for public services.
  • One source of truth for routing in Cloudflare's dashboard.
  • We can move the host without changing public DNS.

The tradeoff is dependency on Cloudflare. If Cloudflare has a bad day, our services go down. We accept that risk because Cloudflare's track record is better than ours would be.

Backups: the part that earns the savings

We covered Postgres backups in detail in our self-hosting Postgres in 2026 piece. The studio-wide backup story extends that:

  • Postgres dumps run nightly via systemd timer in each database container, GPG-encrypted, written to local snapshots.
  • MinIO content is mirrored to the NAS daily.
  • Application environment variables are GPG-encrypted and committed (encrypted) to a private Gitea repository.
  • All backup destinations land in two places: a /share/Config/draftedby_db/ location for the cron output, and a /share/Config/backup/ location for human-organized restore folders. We learned to keep the cron output and the human-curated layouts separate.
  • Critical data has an off-site copy on a NAS at a different location.

The recovery test is the gating criterion. We run a recovery exercise quarterly. Most of the time it works in 35 minutes. The one time it did not (a year ago, after a Postgres major version bump where our restore script assumed an older binary path), we found the issue and fixed the script. That recovery exercise paid for itself a thousand times.

The operational rituals

The boring stuff that keeps it boring:

  1. Sunday morning sweep: I look at uptime, logs, certificate expiries, disk space. 20 minutes. If anything is yellow, fix it that day.
  2. Weekly snapshot review: confirm that the previous week's snapshots are intact and rotate the oldest off.
  3. Monthly major-version review: are any of the services we run getting a major version bump? If yes, plan it for a quiet weekend.
  4. Quarterly recovery drill: pick one critical service, restore from backup to a sandbox, validate.
  5. Annual hardware review: is the box healthy? SMART status on the disks, RAM tests, fan profiles. The day a disk starts looking shaky, replace it before it dies.

None of this is glamorous. None of it is hard. Doing it consistently is the only thing that matters.

When the homelab breaks

Three failure modes I have lived through:

  • Power outage (twice). The first time the host took 15 minutes to come back up cleanly. The second time, after some tuning, it came back in 4 minutes. A UPS would help. It is on the list.
  • Network outage at the ISP (once). Cloudflare Tunnel went down. The fix was the ISP's fix. We had nothing to do. This is the fragility of self-hosting at home.
  • Disk thermal throttling during a hot week. NVMe slowed down enough to affect Postgres latency. We added a heatsink. Fixed.

The lesson across these: most failure modes are slow and recoverable. The ones that hurt are the ones you did not anticipate. Quarterly drills find them.

When this setup is wrong

Three reasons the solo developer homelab Proxmox Coolify approach is the wrong fit:

  1. You travel constantly. A homelab needs a person in the building when something goes seriously wrong. If you are on a plane every week, this is risky.
  2. Your team grows. Multiple people accessing the homelab adds operational complexity that the cost savings probably do not justify.
  3. You handle compliance-regulated data with strict location requirements. Self-hosting from a home address can fail audit requirements. Move to a colo or a managed provider.

For a single founder running four products from one location, this setup is a great fit and we do not see a reason to change. We touched on related infra philosophy in our piece on shipping four SaaS in parallel which depends entirely on this homelab's reliability.

What is next

The next upgrades on my list:

  • A real UPS, finally.
  • A second Proxmox node so we can do live migrations and not panic during major upgrades.
  • A move to immutable container images for deploys instead of build-on-host.

None of these are urgent. The homelab works. The savings are real. The discipline is the price of admission, and frankly the discipline is what I enjoy. If you are deciding whether to run a homelab in 2026, the question is not "is it cheaper?" (yes) but "do you want to learn this craft?" If yes, set aside a weekend to install Proxmox and start with one VM. The rest is iteration. The first month is the hardest. By month six it is automatic.

A small thing

Want to work with us?

We are a small studio shipping focused B2B SaaS for niche professional verticals. If your problem looks like one of ours, we would love to chat.