Designing and Building a Local Dev Platform on EPYC 9175F and Podman

Complete design and build record for a local dev platform on three machines: EPYC 9175F + RTX PRO 6000 MAX-Q Compute Server, Mac Studio, and Storage Server. Covers CPU lane isolation, SATA/NVMe tiering, LUKS encryption, UID/GID strategy, and rootless Podman directory layout.

These articles use AI-generated summaries of Obsidian notes originally kept as technical memos.

English translations are produced with AI assistance.

Conclusion

A local dev platform was built across three machines and a RouterOS-managed 10GbE network, designed so that LLM inference, data pipelines, development environments, and observability can coexist without resource contention. The design rests on four decisions made upfront: isolate CPU lanes, separate SATA and NVMe by role, organize directories into three tiers, and fix UID/GID per service. Locking these down first created a foundation that stays stable as services are added.

The Go + NATS + Dagster AI orchestration platform devstack (podman-compose) runs on top of this foundation.

Hardware

Compute Server (Ubuntu 24.04.3 LTS)

  CPU: AMD EPYC 9175F (5th gen Turin) 16-core 4.2-5.0GHz, L3 512MB
GPU: Nvidia RTX PRO 6000 MAX-Q (300W) 96GB (Blackwell gen)
RAM: DDR5-6400 64GB x 12 (768GB)
MB: Supermicro H13SSL-NT
Storage1: SATA 6Gbps 3.84TB
Storage2: M.2 PCIe4.0 3.84TB
PSU: 1500W (80Plus)
NIC: 10GbE x 2 (en0: 10.10.10.4, en1: disabled)

Desktop PC (macOS latest)

  Mac Studio M1 Ultra
CPU: M1 Ultra
RAM: 64GB
Storage: 1TB
NIC: 10GbE x 1 (10.10.10.2)
Wi-Fi: disabled

Storage Server (Ubuntu 24.04.3 LTS)

  Mac mini late 2018
CPU: Core i3
RAM: DDR4 SO-DIMM 8GB
Storage: SSD 2TB, ext M.2 SSD 4TB, SATA HDD 24TB
NIC: 1GbE, ext 10GbE (RTL8159, 10.10.10.3)
Wi-Fi: disabled

Network

All hosts connected via 10GbE under RouterOS (CRS304).

  VDSL -> Router (RouterOS)
          Wi-Fi 7 -> mobile SSID, IoT SSID
          DHCP Server
             | -> CRS304 (Router+Switch)
                    |-- Port1 -> Desktop PC     10.10.10.2
                    |-- Port2 -> Storage Server 10.10.10.3
                    |-- Port3 -> Compute Server 10.10.10.4
                    |-- Port4 -> Wi-Fi 6 AP     128.0.0.1 (DHCP Server)
                    |-- Port5 -> WAN            192.168.0.2/24 (1GbE DHCP Client)

Host	Role	Uptime
Compute Server	GPU inference + data platform + workflow	Work hours only
Desktop PC	Gateway + messaging + UI	Work hours only
Storage Server	Observability + object storage	24/7

Desktop PC and Compute Server start and stop together. Only Storage Server runs continuously, providing Prometheus/Grafana/Loki monitoring for the whole platform.

Disk Design

Principle

SATA for OS, configuration, and stable sequential writes. NVMe for high-speed random I/O and runtime data.

SATA 3.84TB (Boot Disk)

LUKS → LVM → ext4. LVM allows non-disruptive resizing of root and log volumes over time.

LV	Size	Purpose
/	2048GB	OS + app definitions (including /opt)
/var/log	32GB	System logs (isolated for Loki/promtail)
VG Free	~1.8TB	Snapshots / future expansion

Separating /var/log from root prevents log bloat from filling the root partition.

NVMe (M.2) 3.84TB

LUKS → xfs (no LVM). Entire disk allocated to /home, consolidating service runtime data on the fast path.

  /home/ksh3/
  +-- postgres/data         # DB with WAL
  +-- trino/{spill,exchange,cache}
  +-- prometheus/data       # TSDB
  +-- loki/{data,cache}
  +-- qdrant/data           # Vector index
  +-- models/               # LLM models (vLLM/llama.cpp/ollama)
  +-- dagster/{runs,storage,tmp}
  +-- workspace/            # VSCode clone repos
  +-- obsidian/

Encryption and Boot Unlock

SATA: Dropbear-initramfs for remote SSH LUKS unlock at boot
M.2: Key file in /etc/crypttab for automatic unlock after root is opened

  # /etc/crypttab
dm_crypt-0 UUID=... none luks,discard,initramfs
home-crypt UUID=... /etc/luks-keys/home.key luks

Root can be unlocked remotely even when the server is physically inaccessible. /home requires no manual intervention.

Directory Layout: Three-Tier Separation

Principle

Tier	Path	Contents	Disk
Definitions	`/opt/containers/{app}/`	compose.yml, .env, secrets, systemd templates	SATA
Production data	`/srv/`	Git repos, Iceberg warehouse, media	SATA
Runtime data	`/home/ksh3/{app}/`	DB, spill, cache, models, build artifacts	NVMe

/opt/containers (SATA)

  /opt/containers/
+-- common/
|  +-- networks/create_networks.sh
|  +-- systemd/podman-compose@.service
|  +-- env/                   # Shared .env
+-- postgres/
|  +-- compose.yml, conf/, .env, secrets/
+-- trino/
+-- nessie/
+-- prometheus/
+-- loki/
+-- dagster/
+-- vllm/
+-- qdrant/

/srv (SATA)

  /srv/
+-- git/bare/                 # git --bare init
+-- iceberg/warehouse         # Data lake
+-- nessie/data               # RocksDB store
+-- media/{raw,derived,thumbs,archive}

Transparent NVMe Usage

High-I/O data lives on NVMe but is exposed through consistent paths using /etc/fstab bind mounts.

  # /etc/fstab
/mnt/nvme/postgresql/data   /home/postgres/data   none bind 0 0

Backup tools only need to scan /home to capture all service data.

UID/GID Strategy

Principle

One dedicated UID/GID per service on the host, matching the container’s effective user.

Reserved range: 2001-2999 (container apps)
Infrastructure (2001-2099): Caddy, Registry, Monitoring
Data (2101-2199): Trino, MinIO, Postgres (official image UID 999)
AI/LLM (2301-2399): vLLM, llama.cpp

Example

  # Create Trino user
groupadd -g 2101 trino && useradd -r -u 2101 -g 2101 -m -d /home/trino -s /usr/sbin/nologin trino
install -d -o 2101 -g 2101 -m 0750 /home/trino/{data,logs,scratch}

Podman with Explicit User Mapping

  services:
  trino:
    image: trinodb/trino:443
    user: "2101:2101"
    volumes:
      - /opt/trino/etc:/etc/trino:ro    # Config from /opt, read-only
      - /home/trino/data:/var/trino/data # Data on /home
      - /home/trino/logs:/var/log/trino
      - /home/trino/scratch:/var/trino/scratch
    tmpfs:
      - /tmp:rw,nosuid,nodev,relatime,size=8g

CPU Lane Design

The 9175F has 16 high-clock cores. Pinning plus quotas stabilize latency and prevent runaway jobs.

Lane Allocation

Pool	Core range	Purpose
Resident (apps)	CPU 0-7	DB, IDE, LLM, monitoring
Batch (jobs)	CPU 8-15	ELT, analytics, backups
Lightweight	CPU 6-7	Exporters

Resident Services

Service	cpuset	cpus	Memory limit	Notes
Monitoring lane	`1`	0.5-1.0	256MB-2GB	Prometheus/Loki on SATA
PostgreSQL	`2-3` (burst `2-4`)	2.0 (3.0)	32-64GB
PgBouncer	`2`	0.5	512MB	Connection pooling
VSCode Server	`3-4`	3.0-4.0	24-48GB	Latency-sensitive
Ollama	`5-6`	1.0-2.0	16GB + GPU	Short requests
Trino	`4-16`	6-12.0	128GB (256GB)	One heavy query at a time
Dagster	`4-16`	1.0-2.0	2-4GB	Workers do the real work

Oneshot Jobs

Job	cpuset	cpus	Memory limit
DataFusion Worker	`4-16`	4-8.0	64-256GB
dbt run	`4-16`	6-12.0	8-32GB
PG Load (120GB class)	`2-3`	-	PG: 64GB
Iceberg maintenance	`4-16`	2-6.0	8-32GB
VACUUM / REINDEX	`2-3`	2.0	2-4GB

Launch Parameter Example

  # PostgreSQL (normal)
podman run -d --name pg \
  --cpuset-cpus=2-3 --cpus=2.0 --cpu-weight=900 \
  --memory=48g -v /opt/postgres:/var/lib/postgresql/data postgres:18

# PostgreSQL (heavy day)
podman update --cpuset-cpus=2-4 --cpus=3.0 --memory=64g pg

Service Deployment

Linux Server (Backend, Data Engine)

  +-- PostgreSQL 18 + pgvector (JIT enabled)
+-- Dagster (daemon + sensor + user-code gRPC)
+-- NATS 2.11 (JetStream)        <- co-located with Desktop PC
+-- multi-bert-inference (Rust + ONNX Runtime)
+-- vLLM / llama.cpp / LM Studio
+-- Vector 0.45 (Rust)
+-- Trino + Nessie (Phase 2, Lakehouse)
+-- Podman rootless

Mac Studio (Frontend, UI Hub)

  +-- agent-gateway (Go/Gin)
+-- VSCode.app (Remote-SSH -> Linux)
+-- Grafana UI
+-- Dagit UI
+-- Web Browser (Trino Web UI, etc.)

Storage Server (24/7 Observability)

  +-- Prometheus
+-- Grafana
+-- Loki
+-- Vector (aggregator)
+-- MinIO

Operational Rules

Backups

NAS: Daily tar.gz, 14 generations
- Targets: /home/ksh3/, /srv/iceberg/, /srv/nessie/, /srv/media/, PG backups
- Excludes: */{tmp,cache,spill,exchange}
R2: Artifacts synced via rclone sync as needed

Cleanup

  find /home/ksh3/*/cache -type f -mtime +14 -delete
find /home/ksh3/*/tmp   -type f -mtime +7  -delete

Monitoring

Prometheus node_exporter textfile collector tracks /home/ksh3/*/ usage
pg_stat_activity, wait_event, PSI, Trino query memory and spill are dashboard targets
Structured logs with correlation_id enable full-path tracing

OS Bootstrap

Ubuntu Server 24.04.3 LTS (minimized) with the following packages installed upfront.

  sudo apt install -y --no-install-recommends \
  build-essential curl wget git unzip vim less man-db \
  net-tools iputils-ping nmap iperf3 mtr \
  htop btop nvtop iftop logrotate rclone \
  fzf ripgrep bat \
  openssh-server dropbear-initramfs libfido2-1 libu2f-udev fido2-tools \
  nvidia-headless-580 nvidia-utils-580 \
  podman podman-compose \
  dbus-user-session \
  locales fonts-noto-cjk fonts-noto-cjk-extra

GPU drivers, container runtime, remote access, Japanese fonts, and CLI tools in one pass. dbus-user-session is required for rootless Podman and user services.

Caveats

Putting all of /home on NVMe means NVMe failure takes out all service data. Backup generation management is the lifeline
Rootless pods use userns / slirp4netns, so binding host ports below 1024 requires additional configuration
PostgreSQL 18 JIT requires shm_size: 4gb. Insufficient shared memory risks OOM kills
CPU pinning must be managed consistently through either YAML or CLI — mixing both leads to lost settings after restart

Final Configuration

For the production runtime built on this foundation, see:

Go + NATS + Dagster AI Orchestration Platform — devstack containers, 3-host topology, NATS event design

How I Stabilized smartctl-exporter and Standardized Exporter Operations Across Rootful and Rootless Scopes

Establishing a consistent …

Running Resident Services with Quadlet on Minimal Ubuntu

A practical guide to managing …