Private AI at Home with Ollama + Open WebUI

I spend most of my working day talking to LLMs. Architecture reviews, writing infrastructure runbooks, drafting proposals, working through tricky Terraform. For the professional and personal stuff where I’d rather not have my prompts sent to a third-party API, I run models locally.

The stack is simple: Ollama to manage and serve models, Open WebUI as the browser-based interface, both running in Docker on my home lab.

Why Local?

Three reasons:

Privacy: Conversations about client infrastructure, internal architecture, or anything commercially sensitive don’t leave my machine
Cost: No API bills. I run local models as much as I can, and use cloud APIs only when the task genuinely requires it
Latency when offline: Flights, trains, remote sites — local models work without a connection

The tradeoff is capability. Local models at reasonable hardware constraints are behind the frontier models. But for a large fraction of daily tasks, they’re good enough.

Hardware

Ollama runs on my Proxmox host (the gaming laptop described in my homelab post). The CPU-only inference is usable for smaller models — Llama 3.2 3B runs at a comfortable speed, and Mistral 7B is acceptable for non-latency-sensitive tasks.

If you have a machine with a CUDA GPU, inference speed improves dramatically. On a laptop without GPU passthrough to the VM, I’m CPU-bound.

Setup

Ollama

# On the host or in a Docker container
curl -fsSL https://ollama.com/install.sh | sh

# Pull models
ollama pull llama3.2:3b
ollama pull mistral:7b
ollama pull nomic-embed-text  # for embeddings / RAG

Ollama runs as a systemd service and listens on port 11434. It handles model storage, versioning, and GPU/CPU dispatch automatically.

Open WebUI via Docker

# docker-compose.yml
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - open-webui:/app/backend/data
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped

volumes:
  open-webui:

Open WebUI gives you a ChatGPT-like interface with conversation history, model switching, system prompt configuration, and RAG (retrieval-augmented generation) if you want to chat with documents.

Access it at http://<your-lab-ip>:3000 from any device on your local network (or via Tailscale from anywhere).

Models I Use

Model	Use case	Speed (CPU)
`llama3.2:3b`	Quick Q&A, drafting	Fast
`mistral:7b`	Code review, longer reasoning	Moderate
`deepseek-coder:6.7b`	Infrastructure code, Terraform	Moderate
`nomic-embed-text`	Embeddings for document search	Fast

For anything requiring frontier-level reasoning — complex architectural tradeoffs, novel problem-solving — I still use Claude or GPT. Local models are a complement, not a replacement.

What Works Well

Drafting runbooks and documentation from bullet points
Explaining infrastructure concepts for client-facing materials
Code review of Terraform/Ansible before I commit
Generating boilerplate that I then edit
Querying uploaded PDFs (architecture docs, vendor whitepapers)

What Doesn’t

Long context tasks with 100K+ tokens — 7B models struggle
Nuanced reasoning about unfamiliar domains
Anything where you genuinely need the best available model

Accessing It Remotely

Via Tailscale, Open WebUI is reachable from my phone and laptop from anywhere. I have it bookmarked as a PWA on mobile. It’s not as fast as cloud APIs over mobile data, but for async drafting tasks it works.

Next Steps

I’m experimenting with a local RAG pipeline using nomic-embed-text + a ChromaDB container to query my own notes and runbook library. More on that when it’s working reliably.

If you have a spare machine with 16GB+ RAM, running local AI is worth the two-hour setup. The privacy and offline availability alone justify it.