Optimizing Data Infrastructure I/O: NVMe/SATA Tiering and UI Consolidation
A storage allocation policy based on NVMe/SATA I/O characteristics, combined with a strategy to consolidate management UIs onto a separate host.
Introduction
When running a modern data analytics platform for personal use or small teams, storage I/O design is often the biggest hurdle. While NVMe SSDs have revolutionized burst read performance, continuous write loads from logs and metrics can negatively impact the latency of critical data queries.
In this article, I share the storage allocation policy I implemented to enhance system stability and the UI consolidation strategy of separating management UIs to a different host.
Motivation: Why Separate Storage?
Initially, I centralized all data on high-speed NVMe SSDs. However, several issues became apparent during operation:
- Continuous Write Fluctuations: Logs (systemd, Loki) and metrics (Prometheus) consume I/O bandwidth constantly. While NVMe is fast, frequent
fsynccalls and steady synchronous writes can deplete internal buffers, leading to unpredictable latency in query responses. - I/O Contention Risks: When analytics engines (Trino, dbt) spill large amounts of data to disk, contention with logging operations can degrade overall system responsiveness.
- UI Daemon Overhead: Management interfaces like Dagit and Grafana generate their own administrative logs. Running these on the same node as the core data engine is inefficient from a resource management perspective.
To resolve these, I redefined storage device roles (NVMe vs. SATA) and host responsibilities (Linux vs. Mac).
Redefining Storage Allocation Policy
Data is tiered into two categories based on I/O characteristics:
NVMe: Optimized for High Throughput and Burst Reads
Everything requiring fast random access and low latency is placed on NVMe.
- Primary Postgres Data: Major database files, including WAL, are placed here to maximize query performance.
- Analytics Spill/Cache: Critical for Trino spills (disk-swapping when memory is full), Iceberg staging, and dbt targets where massive transient I/O occurs.
- LLM Models: Large model files for Ollama or vLLM (GBs to tens of GBs) must be on NVMe, as loading speed directly impacts UX.
- Vector Search and Intermediate Data: This includes Qdrant indices and Dagster intermediate data caches.
SATA: Optimized for Steady Sequential Writes
Continuous write workloads and less performance-critical files are offloaded to SATA SSDs. SATA drives are generally more predictable during steady-state writes.
- System Logs and Time-Series Data: In addition to
/var/log, high-rate writes from Prometheus TSDB and Loki chunks/indices have been moved here. - Container Metadata: Podman container layers and compose definitions.
- Backups: Standard practice dictates placing backups on a physical device separate from the primary data.
UI Daemon Consolidation Strategy
The other major shift is the role-based split: “Linux as the Data Engine, Mac as the UI Hub.”
Management UIs (Dagit, Grafana, Lightdash, Trino Web UI) were moved off the Linux node to run on the primary workstation (Mac) via Podman or Docker.
Benefits of This Architecture
- Reduced I/O Load on Linux: Fine-grained logging and config file access from UI containers are distributed to the Mac SSD.
- Simplified Linux Environment: Reducing the number of rootless podman instances on the Linux side minimizes management overhead and I/O contention risks.
- Transparent Access: Since the Mac-based UIs connect to the Linux backends via gRPC, HTTP, or SQL, the user experience remains identical.
Conclusion and Future Outlook
This strategy has allowed the Linux node to focus purely on being a high-performance compute and data engine. Physical tiering based on I/O characteristics is highly effective, especially in resource-constrained environments.
Moving forward, I plan to evaluate the overhead and stability differences between Podman on Mac and Docker Desktop as the runtime for these management UI containers.
Allocation Policy Summary
| Target | Data |
|---|---|
| NVMe | DB, Trino Spill, Iceberg, LLM Models, Dagster Cache |
| SATA | Logs, Prometheus, Loki, Backup, Podman Metadata |
| Mac Host | All Management UIs (Dagit, Grafana, etc.) |
