Ollama
Overview
Section titled “Overview”Ollama is the engine powering the homelab’s local AI capabilities. It allows running Large Language Models (LLMs) like Llama 3 or Mistral directly on the server, ensuring privacy and local-first execution.
In this homelab, Ollama is integrated into the system for use by agents (like PicoClaw) and can be accessed via the dashboard using any compatible web client (like Page Assist).
| Port | Description | Access |
|---|---|---|
11434 | API Endpoint | Local Network & Proxy |
External Access
Section titled “External Access”The service is proxied via Nginx and accessible at:
- Local:
http://<homelab-ip>:11434(requires Local Network access) - Remote:
https://ollama-home.javiersc.com(via Cloudflare Tunnel)
Secrets
Section titled “Secrets”Ollama does not require external secrets for its basic operation. Authentication, if needed by clients, is handled at the proxy or application level.
Backup
Section titled “Backup”- State: Configured via
homelab.backupPaths = [ "/var/lib/ollama" ]. - Exclusions: The
models/directory is excluded from backups to save space, as models can be easily redownloaded usingollama pull.
Troubleshooting
Section titled “Troubleshooting”Service is not responding locally
Section titled “Service is not responding locally”Ensure that the service is configured to listen on all interfaces:
services.ollama.host = "0.0.0.0";Dashboard link doesn’t work
Section titled “Dashboard link doesn’t work”If the dashboard link fails, check the Nginx Proxy status and ensure the port 11434 is registered in homelab.proxies.ollama. The local firewall rules are derived in modules/nixos/system/network.nix.
Recommended Models (N150 CPU)
Section titled “Recommended Models (N150 CPU)”For hardware without a dedicated GPU (like the Intel N150), use small quantized models to maintain low latency, especially for voice interaction:
| Model | Size | Best For |
|---|---|---|
gemma2:2b | 2B | High quality (Classic) |
gemma4:e2b | 2B | New! High efficiency/Voice |
gemma4:e4b | 4B | High reasoning (Slower on CPU) |
llama3.2:1b | 1B | General purpose (Balanced) |
qwen2.5:0.5b | 500M | Voice (Lowest latency) |
qwen2.5:1.5b | 1.5B | Complex instructions |