Designing and Building a Local Dev Platform on EPYC 9175F and Podman
Complete design and build record for a local dev platform on three machines: EPYC 9175F + RTX PRO 6000 MAX-Q Compute Server, Mac Studio, and Storage Server. Covers CPU lane isolation, SATA/NVMe tiering, LUKS encryption, UID/GID strategy, and rootless Podman directory layout.
Conclusion
A local dev platform was built across three machines and a RouterOS-managed 10GbE network, designed so that LLM inference, data pipelines, development environments, and observability can coexist without resource contention. The design rests on four decisions made upfront: isolate CPU lanes, separate SATA and NVMe by role, organize directories into three tiers, and fix UID/GID per service. Locking these down first created a foundation that stays stable as services are added.
The Go + NATS + Dagster AI orchestration platform devstack (podman-compose) runs on top of this foundation.
Hardware
Compute Server (Ubuntu 24.04.3 LTS)
CPU: AMD EPYC 9175F (5th gen Turin) 16-core 4.2-5.0GHz, L3 512MB
GPU: Nvidia RTX PRO 6000 MAX-Q (300W) 96GB (Blackwell gen)
RAM: DDR5-6400 64GB x 12 (768GB)
MB: Supermicro H13SSL-NT
Storage1: SATA 6Gbps 3.84TB
Storage2: M.2 PCIe4.0 3.84TB
PSU: 1500W (80Plus)
NIC: 10GbE x 2 (en0: 10.10.10.4, en1: disabled)
Desktop PC (macOS latest)
Mac Studio M1 Ultra
CPU: M1 Ultra
RAM: 64GB
Storage: 1TB
NIC: 10GbE x 1 (10.10.10.2)
Wi-Fi: disabled
Storage Server (Ubuntu 24.04.3 LTS)
Mac mini late 2018
CPU: Core i3
RAM: DDR4 SO-DIMM 8GB
Storage: SSD 2TB, ext M.2 SSD 4TB, SATA HDD 24TB
NIC: 1GbE, ext 10GbE (RTL8159, 10.10.10.3)
Wi-Fi: disabled
Network
All hosts connected via 10GbE under RouterOS (CRS304).
VDSL -> Router (RouterOS)
Wi-Fi 7 -> mobile SSID, IoT SSID
DHCP Server
| -> CRS304 (Router+Switch)
|-- Port1 -> Desktop PC 10.10.10.2
|-- Port2 -> Storage Server 10.10.10.3
|-- Port3 -> Compute Server 10.10.10.4
|-- Port4 -> Wi-Fi 6 AP 128.0.0.1 (DHCP Server)
|-- Port5 -> WAN 192.168.0.2/24 (1GbE DHCP Client)
| Host | Role | Uptime |
|---|---|---|
| Compute Server | GPU inference + data platform + workflow | Work hours only |
| Desktop PC | Gateway + messaging + UI | Work hours only |
| Storage Server | Observability + object storage | 24/7 |
Desktop PC and Compute Server start and stop together. Only Storage Server runs continuously, providing Prometheus/Grafana/Loki monitoring for the whole platform.
Disk Design
Principle
SATA for OS, configuration, and stable sequential writes. NVMe for high-speed random I/O and runtime data.
SATA 3.84TB (Boot Disk)
LUKS → LVM → ext4. LVM allows non-disruptive resizing of root and log volumes over time.
| LV | Size | Purpose |
|---|---|---|
| / | 2048GB | OS + app definitions (including /opt) |
| /var/log | 32GB | System logs (isolated for Loki/promtail) |
| VG Free | ~1.8TB | Snapshots / future expansion |
Separating /var/log from root prevents log bloat from filling the root partition.
NVMe (M.2) 3.84TB
LUKS → xfs (no LVM). Entire disk allocated to /home, consolidating service runtime data on the fast path.
/home/ksh3/
+-- postgres/data # DB with WAL
+-- trino/{spill,exchange,cache}
+-- prometheus/data # TSDB
+-- loki/{data,cache}
+-- qdrant/data # Vector index
+-- models/ # LLM models (vLLM/llama.cpp/ollama)
+-- dagster/{runs,storage,tmp}
+-- workspace/ # VSCode clone repos
+-- obsidian/
Encryption and Boot Unlock
- SATA: Dropbear-initramfs for remote SSH LUKS unlock at boot
- M.2: Key file in
/etc/crypttabfor automatic unlock after root is opened
# /etc/crypttab
dm_crypt-0 UUID=... none luks,discard,initramfs
home-crypt UUID=... /etc/luks-keys/home.key luks
Root can be unlocked remotely even when the server is physically inaccessible. /home requires no manual intervention.
Directory Layout: Three-Tier Separation
Principle
| Tier | Path | Contents | Disk |
|---|---|---|---|
| Definitions | /opt/containers/{app}/ | compose.yml, .env, secrets, systemd templates | SATA |
| Production data | /srv/ | Git repos, Iceberg warehouse, media | SATA |
| Runtime data | /home/ksh3/{app}/ | DB, spill, cache, models, build artifacts | NVMe |
/opt/containers (SATA)
/opt/containers/
+-- common/
| +-- networks/create_networks.sh
| +-- systemd/[email protected]
| +-- env/ # Shared .env
+-- postgres/
| +-- compose.yml, conf/, .env, secrets/
+-- trino/
+-- nessie/
+-- prometheus/
+-- loki/
+-- dagster/
+-- vllm/
+-- qdrant/
/srv (SATA)
/srv/
+-- git/bare/ # git --bare init
+-- iceberg/warehouse # Data lake
+-- nessie/data # RocksDB store
+-- media/{raw,derived,thumbs,archive}
Transparent NVMe Usage
High-I/O data lives on NVMe but is exposed through consistent paths using /etc/fstab bind mounts.
# /etc/fstab
/mnt/nvme/postgresql/data /home/postgres/data none bind 0 0
Backup tools only need to scan /home to capture all service data.
UID/GID Strategy
Principle
One dedicated UID/GID per service on the host, matching the container’s effective user.
- Reserved range: 2001-2999 (container apps)
- Infrastructure (2001-2099): Caddy, Registry, Monitoring
- Data (2101-2199): Trino, MinIO, Postgres (official image UID 999)
- AI/LLM (2301-2399): vLLM, llama.cpp
Example
# Create Trino user
groupadd -g 2101 trino && useradd -r -u 2101 -g 2101 -m -d /home/trino -s /usr/sbin/nologin trino
install -d -o 2101 -g 2101 -m 0750 /home/trino/{data,logs,scratch}
Podman with Explicit User Mapping
services:
trino:
image: trinodb/trino:443
user: "2101:2101"
volumes:
- /opt/trino/etc:/etc/trino:ro # Config from /opt, read-only
- /home/trino/data:/var/trino/data # Data on /home
- /home/trino/logs:/var/log/trino
- /home/trino/scratch:/var/trino/scratch
tmpfs:
- /tmp:rw,nosuid,nodev,relatime,size=8g
CPU Lane Design
The 9175F has 16 high-clock cores. Pinning plus quotas stabilize latency and prevent runaway jobs.
Lane Allocation
| Pool | Core range | Purpose |
|---|---|---|
| Resident (apps) | CPU 0-7 | DB, IDE, LLM, monitoring |
| Batch (jobs) | CPU 8-15 | ELT, analytics, backups |
| Lightweight | CPU 6-7 | Exporters |
Resident Services
| Service | cpuset | cpus | Memory limit | Notes |
|---|---|---|---|---|
| Monitoring lane | 1 | 0.5-1.0 | 256MB-2GB | Prometheus/Loki on SATA |
| PostgreSQL | 2-3 (burst 2-4) | 2.0 (3.0) | 32-64GB | |
| PgBouncer | 2 | 0.5 | 512MB | Connection pooling |
| VSCode Server | 3-4 | 3.0-4.0 | 24-48GB | Latency-sensitive |
| Ollama | 5-6 | 1.0-2.0 | 16GB + GPU | Short requests |
| Trino | 4-16 | 6-12.0 | 128GB (256GB) | One heavy query at a time |
| Dagster | 4-16 | 1.0-2.0 | 2-4GB | Workers do the real work |
Oneshot Jobs
| Job | cpuset | cpus | Memory limit |
|---|---|---|---|
| DataFusion Worker | 4-16 | 4-8.0 | 64-256GB |
| dbt run | 4-16 | 6-12.0 | 8-32GB |
| PG Load (120GB class) | 2-3 | - | PG: 64GB |
| Iceberg maintenance | 4-16 | 2-6.0 | 8-32GB |
| VACUUM / REINDEX | 2-3 | 2.0 | 2-4GB |
Launch Parameter Example
# PostgreSQL (normal)
podman run -d --name pg \
--cpuset-cpus=2-3 --cpus=2.0 --cpu-weight=900 \
--memory=48g -v /opt/postgres:/var/lib/postgresql/data postgres:18
# PostgreSQL (heavy day)
podman update --cpuset-cpus=2-4 --cpus=3.0 --memory=64g pg
Service Deployment
Linux Server (Backend, Data Engine)
+-- PostgreSQL 18 + pgvector (JIT enabled)
+-- Dagster (daemon + sensor + user-code gRPC)
+-- NATS 2.11 (JetStream) <- co-located with Desktop PC
+-- multi-bert-inference (Rust + ONNX Runtime)
+-- vLLM / llama.cpp / LM Studio
+-- Vector 0.45 (Rust)
+-- Trino + Nessie (Phase 2, Lakehouse)
+-- Podman rootless
Mac Studio (Frontend, UI Hub)
+-- agent-gateway (Go/Gin)
+-- VSCode.app (Remote-SSH -> Linux)
+-- Grafana UI
+-- Dagit UI
+-- Web Browser (Trino Web UI, etc.)
Storage Server (24/7 Observability)
+-- Prometheus
+-- Grafana
+-- Loki
+-- Vector (aggregator)
+-- MinIO
Operational Rules
Backups
- NAS: Daily
tar.gz, 14 generations- Targets:
/home/ksh3/,/srv/iceberg/,/srv/nessie/,/srv/media/, PG backups - Excludes:
*/{tmp,cache,spill,exchange}
- Targets:
- R2: Artifacts synced via
rclone syncas needed
Cleanup
find /home/ksh3/*/cache -type f -mtime +14 -delete
find /home/ksh3/*/tmp -type f -mtime +7 -delete
Monitoring
- Prometheus node_exporter textfile collector tracks
/home/ksh3/*/usage pg_stat_activity,wait_event, PSI, Trino query memory and spill are dashboard targets- Structured logs with correlation_id enable full-path tracing
OS Bootstrap
Ubuntu Server 24.04.3 LTS (minimized) with the following packages installed upfront.
sudo apt install -y --no-install-recommends \
build-essential curl wget git unzip vim less man-db \
net-tools iputils-ping nmap iperf3 mtr \
htop btop nvtop iftop logrotate rclone \
fzf ripgrep bat \
openssh-server dropbear-initramfs libfido2-1 libu2f-udev fido2-tools \
nvidia-headless-580 nvidia-utils-580 \
podman podman-compose \
dbus-user-session \
locales fonts-noto-cjk fonts-noto-cjk-extra
GPU drivers, container runtime, remote access, Japanese fonts, and CLI tools in one pass. dbus-user-session is required for rootless Podman and user services.
Caveats
- Putting all of
/homeon NVMe means NVMe failure takes out all service data. Backup generation management is the lifeline - Rootless pods use
userns/slirp4netns, so binding host ports below 1024 requires additional configuration - PostgreSQL 18 JIT requires
shm_size: 4gb. Insufficient shared memory risks OOM kills - CPU pinning must be managed consistently through either YAML or CLI — mixing both leads to lost settings after restart
Final Configuration
For the production runtime built on this foundation, see:
- Go + NATS + Dagster AI Orchestration Platform — devstack containers, 3-host topology, NATS event design
