Redesigning a 3-Host Homelab: From Promtail Removal to devstack Splitting and Config Consolidation
A record of migrating from Promtail to Vector, splitting devstack by host, and consolidating Go config into defaults.go across a 3-host homelab (storage / desktop / compute), aligning dev and production topologies.
Conclusion
Across the three homelab hosts (storage / desktop / compute), log collection was migrated to Vector, devstack compose files were split, and Go config was consolidated—all as a single coordinated effort. The final host layout is as follows:
desktop.home.arpa compute.home.arpa storage.home.arpa
───────────────── ───────────────── ─────────────────
agent-gateway :8080 vLLM :8000 Vector
NATS :4222 llama.cpp :8001 Loki :3100
reranker :8081 PostgreSQL :5432 Prometheus :9090
LM Studio :1234 Dagster :3300 MinIO :9000
MLflow :5050
desktop is the routing layer—“receive requests and dispatch to the right backend.” compute is the computation layer—“inference, data processing, experiment tracking.” storage is the persistence layer—“log aggregation, metrics collection, object storage.” Each host has a clear responsibility, and the devstack compose files were split to match this three-layer structure.
Background
The homelab runs a three-host setup centered on agent-gateway. storage.home.arpa is the 24/7 observability + object store layer (Prometheus, Loki, MinIO, various exporters). desktop.home.arpa runs macOS and serves as the gateway + messaging layer with agent-gateway and NATS JetStream. compute.home.arpa handles GPU inference (vLLM, llama.cpp) and the data platform (PostgreSQL, Dagster, MLflow), started on demand.
Behind this setup is a Nikkei 225 real-time prediction use case. The prediction pipeline requires stable parallel event execution and memory-safe transform processing—Vector’s transform layer fits those requirements. Running locally avoids the latency, data leakage, and resource instability concerns of cloud deployment, maintaining full control over resources and machines. Since all components are containerized, the same setup can be reproduced given the hardware.
Three problems had accumulated:
- Log collection depended on Promtail—Promtail is a Loki-specific tail agent that cannot handle multiple outputs like NATS publishing or Prometheus metrics generation
- devstack was a single compose—one podman-compose.yml at the root bundled all services, diverging from production’s distributed host layout
- Config was scattered across
.envrc—hostnames and ports were fixed, yet managed via environment variables, creating a breeding ground for bugs when defaults and.envrcvalues drifted apart
Phase 1: Promtail → Vector Migration
Motivation
Vector is a general-purpose data pipeline that can simultaneously route input from a single source to multiple outputs: forwarding to Loki, publishing to NATS, and generating Prometheus metrics. With plans to eventually ingest infrastructure logs into the Dagster data pipeline via NATS, migrating from Promtail made sense.
Production Vector Setup
Vector was already deployed on storage.home.arpa as a rootful systemd quadlet. Memory usage was a lightweight 19.3 MB (peak 25.7 MB), and it was already added as a Prometheus scrape target.
The running config has five sources:
- journald—collecting only the current boot session with
current_boot_only - file_security—alternatives.log, apport.log
- file_apt—apt/dpkg logs
- syslog_udp—UDP:1514, receiving syslog from MikroTik routers and others
- internal_metrics—Vector’s own metrics
The transform layer uses route to split journald into kernel and systemd streams, applying job/host labels to each. Kernel logs match on ._TRANSPORT == "kernel", and systemd logs additionally extract _SYSTEMD_UNIT and __UID__. Syslog is parsed to extract appname/severity/facility as labels.
Sinks run two paths: all transform outputs go to Loki with JSON encoding, and internal_metrics are exposed via a Prometheus exporter (:9598). The only externally exposed ports are 9598 for Prometheus scraping and 4222 for NATS client connections.
Removing devstack Vector
There was a second Vector in devstack. It subscribed to telemetry.> subjects from NATS JetStream, parsed JSON, and routed by domain—storage handled log collection while devstack consumed telemetry, forming two separate pipelines.
Three options were considered for coordinating them:
- Option A: storage → NATS publish, flowing
telemetry.infra.*to desktop - Option B: devstack → Loki sink, aggregating NATS telemetry into Loki as well
- Option C: flow both directions so Loki and NATS both have complete data
Implementation initially proceeded with Option C, but was redirected by the judgment that “devstack is for development, so this should be delegated to the production side.” The devstack Vector was removed entirely, and the storage side was returned to a simple config (journald + file + syslog → Loki + Prometheus exporter).
A key design decision: NATS publishing is handled by agent-gateway’s Go code. Gateway goroutines publish to telemetry.* and pipeline.* subjects during request processing, accumulating in JetStream streams. Vector is a log collection specialist, separate from agent-gateway’s event pipeline. The conclusion was that cross-referencing via CorrelationID alone is sufficient.
Phase 2: Splitting devstack by Host
Motivation
The root podman-compose.yml housed NATS, PostgreSQL, Dagster (3 containers), MLflow, Reranker, minio-init, and more—8 services total. In production, services are distributed across storage / desktop / compute, but devstack crammed everything into one compose. The problem of “it works in devstack but all the connection targets are different in production” was becoming apparent.
Split Result
The compose was split into two files matching the production host layout:
- devstack/desktop/ — NATS + nats-init + Reranker
- devstack/compute/ — PostgreSQL + Dagster (x3) + MLflow + minio-init (for development)
The desktop side is lightweight: NATS container (JetStream enabled, with health check), nats-init (creating PIPELINE / TELEMETRY streams), and Reranker only.
The compute side has cross-host references. Dagster’s dagster-user-code and dagster-daemon connect to desktop’s NATS with NATS_URL: nats://desktop.home.arpa:4222. This was originally nats://nats:4222 within the compose, but since NATS moved to a separate host’s compose, it now uses the hostname. Cross-compose dependency control isn’t possible, so startup order is managed operationally.
Dagster / MLflow Placement Decision
Initially there was consideration of “placing Dagster UI on desktop and splitting only user-code and daemon to compute.” Dagster’s architecture allows separating webserver from user-code/daemon, referencing remote gRPC endpoints via workspace.yaml.
However, MLflow’s mlflow server command bundles UI and tracking server as one unit—they cannot be separated. Since Dagster assets sometimes reference MLflow experiment links, having them on the same host is operationally easier. Accepting the practical constraint that “UI and computation cannot be separated,” the decision was finalized to place all of Dagster and MLflow on compute. From desktop, accessing http://compute.home.arpa:3300 (Dagster UI) and :5050 (MLflow UI) via browser is sufficient.
minio-init Handling
It was deleted once but restored. Although MinIO is running on storage.home.arpa, the agw-mlflow / agw-iceberg buckets had not yet been created. It was kept as an initialization job to reliably create buckets on first setup, with a service_completed_successfully condition on MLflow’s depends_on to prevent startup without buckets.
Phase 3: Removing .envrc and Consolidating into defaults.go
Motivation
agent-gateway’s config was managed via direnv’s .envrc. About 25 lines of environment variables were defined—COMPUTE_HOST, STORAGE_HOST, VLLM_BASE_URL, NATS_URL, POSTGRES_DSN, etc.—injected from the shell at go run time.
But in a local infrastructure, hostnames and ports are fixed, and secrets are fixed development values. There is no need to switch via environment variables. In fact, when .envrc goes stale, “the default value in config.go doesn’t match the .envrc value” becomes a bug source.
Design
All default values were consolidated as constants in internal/config/defaults.go. The three host names, port numbers for each service, DB connection details, and log setting defaults were all moved into Go const blocks. The config.go Load() function was unified around four helpers—envOr / boolEnvOr / intEnvOr / optionalBoolEnv—building AppConfig in a single concise return statement.
The core pattern is service discovery rooted in the three hosts:
COMPUTE_HOST → vLLM, llama.cpp, PostgreSQL, Dagster, MLflow
STORAGE_HOST → Loki, MinIO, Prometheus
DESKTOP_HOST → NATS, Reranker, LM Studio
If the hostnames are correct, there is no need to specify individual URLs via environment variables. Override capability is preserved, so temporarily changing a port during development is as simple as VLLM_BASE_URL=http://localhost:8000 go run ./cmd/server.
During implementation, port number mix-ups occurred twice. llama.cpp was set to 8081 and Reranker to 8001, but the correct assignment is llama.cpp:8001 (compute side) and Reranker:8081 (desktop side). Also, DagsterBaseURL’s default host was initially set to desktop, but since Phase 2 decided to place all of Dagster on compute, it was corrected. Comments in defaults.go explicitly note which host each port belongs to.
Overall Event Flow
Two data paths exist:
- Path A: agent-gateway goroutine → NATS publish → Dagster sensor. A non-real-time path used for request lineage tracking and data synthesis
- Path B: storage Vector → Loki. A real-time infrastructure log aggregation path
The two are designed to be cross-referenced via CorrelationID. Vector doesn’t need to publish to NATS because Path A is already self-contained on the agent-gateway side.
The devstack compose split means development now tests with a network topology close to production. The compute-side Dagster’s NATS_URL points to nats://desktop.home.arpa:4222, matching the production layout. To run everything on localhost, simply override with environment variables.
