Logo loFT LLC

    • Website Renewal
    • Absorption Merger of Subsidiary Lorchestra Inc.
    • Selected as IT Introduction Support Provider (FY2022)
    • Selected as IT Introduction Support Provider (FY2021)
    • Establishment of Subsidiary Lorchestra Inc.
    • Selected as IT Introduction Support Provider (FY2020)
    • loFT LLC Established
      • Dagster + NATS JetStream Event Pipeline: Implementation Deep Dive
      • Homelab Infrastructure Redesign -- PostgreSQL Storage/Compute Separation and Devstack Overhaul
      • Integrating MLflow into devstack — Separating Dagster and Experiment Tracking Responsibilities
      • Redesigning a 3-Host Homelab: From Promtail Removal to devstack Splitting and Config Consolidation
      • Running PostgreSQL 18 with pgvector on Rootless Quadlet
      • AMD EPYC 9175F (Turin) Workstation Configuration: HPCT WCE51-GP
      • Optimizing Data Infrastructure I/O: NVMe/SATA Tiering and UI Consolidation
      • Optimizing QuteBrowser Configuration for Internal Infrastructure
      • Using a CRS304 for Local 10GbE and Staged Egress Control
      • State-Driven Syslog Monitoring with MikroTik RouterOS Netwatch
      • Building a Container Platform with Rootless Podman and Quadlet: UID Mapping, Permission Design, and macOS DNS Resolution
      • Building the Storage Server's Always-On Monitoring Stack: Prometheus, Loki, Promtail, and Quadlet
      • How I Stabilized smartctl-exporter and Standardized Exporter Operations Across Rootful and Rootless Scopes
      • Designing and Building a Local Dev Platform on EPYC 9175F and Podman
      • Running Resident Services with Quadlet on Minimal Ubuntu
      • How I Structured a Reusable Memory Pool for ELT on a Single EPYC Server
      • Rebuilding a Compute Server in 20-30 Minutes with tar.zst and rclone
      • Containerizing a Local LLM Stack: Docker Compose for vLLM, llama.cpp, and a Rust Proxy
      • How I Split rclone and rsync When Moving Hugging Face Models from Cold to Hot Storage
      • How I Designed an OpenAI-Compatible Proxy in Rust (axum) and Why I Moved to Go
      • Go + NATS + Dagster AI Orchestration Platform: Design Philosophy and Middleware Selection
      • familiar - Building a Multi-Agent Development Platform That Runs Only on Local LLMs
      • Evaluating llm-jp-4-32b-a3b-base-NVFP4 for Translation and Pivoting Away from a Resident Translator Role
      • Validating the familiar Harness: Field Observations of a Cloud-Agent orchestrator with Qwen3-Coder-Next 80B / GLM-5.1
      • Building agent-gateway -- Phase 1 Real-Time Knowledge Pipeline and Embedding Service Integration
      • Gemma 4 + Dual Blackwell GPUs: Building the familiar Inference Stack and model-foundry Pipeline
      • agent-gateway v3 Redesign — Splitting the knowledge Domain and Integrating MLflow/Obsidian
      • How I Locked Down a Rust + ONNX Design for Embedding and Rerank APIs
      • How I Framed a WordPress-Like Blog Platform on Django
      • Django 5 Travel Booking Site Generation Test with Qwen3.5-122B-A10B Local Inference
      • Step-3.7-Flash-NVFP4 as a Local Orchestrator: Multi-Agent System Development
      • Gemma 4 31B on vLLM/SGLang: NVFP4/FP8 and MTP Benchmark
      • Running MiMo V2.5 Pro IQ2_S Locally: RTX PRO 6000 Blackwell x1/x2 Benchmark
      • DwarfStar 4 × RTX PRO 6000 Blackwell: DeepSeek V4 Flash Q2 Reaches 43 tok/s
      • Measuring Qwen3.6-27B NVFP4+MTP on vLLM: ~190 tok/s TG on Dual RTX PRO 6000 Blackwell Max-Q
      • Running DeepSeek-V4-Flash with a llama.cpp WIP Branch: First Local Inference on Dual Blackwell Max-Q 96GB GPUs
      • Qwen3.6-27B-FP8: Role-Specific Fine-Tuning Strategy and Integration into My Agent Stack
      • Running Kimi-K2.6 Locally: Making a 1T MoE Practical with ik_llama.cpp and Blackwell
      • Validating a Japanese Data Generation Pipeline with LLM-jp-4-32B-NVFP4 x CAT-Translate-7B-NVFP4
      • Running GLM-5.1 IQ3_KS Locally: CPU/GPU Hybrid Inference and Expert Layer Placement
      • Qwen3.5-397B-A17B Validation: Making 55 t/s and 262k Tool-Use Loops Practical on 2x Blackwell 96GB
      • Running MiniMax-M2.7 (229B MoE) on 2x Blackwell 96GB: 71.9 t/s on Average, but No Commercial Use
      • Optimizing a GLM-5.1 + Qwen3-Coder-Next Stack: Orchestrator TG Benchmarks and Final Layout Design
      • Designing and Implementing a Dagster Conversation Lineage, Evaluation, and Dataset Generation System
      • Evaluating Qwen3.5 Coding Ability on a Static Dental Clinic Site
      • Planning a GPU/CPU Division for Local LLM, and the Reality of Daily Trial and Error
      • Designing Bilingual System Prompts for the PLAMO-translate AI MODEL
      • LTX-2 Video Generation Prompt Engineering: From 36-Scene Horror to Cinematic Continuity Pipelines
      • How I'd Choose a Daily Quantization Setup for Hermes-4.3-36B
      • The Reality of 40B Dense Models: What Running IQuest-Coder-V1-40B on CPU/GPU/Aider Actually Showed
      • What I Learned from Running Command-A Reasoning 08-2025 Inside an Aider Coding Loop
      • Reworking a Local AI Coding Environment Around Serena MCP
      • Where GLM-4.7-Flash Uncensored Helps and Where It Becomes Dangerous
      • Why IQuest-Coder Loop-Instruct Still Feels Slow in Aider
      • Why MCP Worked in VSCode Remote SSH but Not in Zed
      • Why EPYC 9175F's 512MB L3 Cache Accelerates MoE Inference: Hypothesis Validation with a 1T Model
      • MiniMax-2.5 (229B MoE) Expert Offload and Web Generation: IQ5_K to IQ3_S
      • Qwen3.5-397B IQ4_NL Measured: 22.5tok/s Average from 28 Runs, Hybrid Offload Config and 400B-Class MoE Daily Viability
      • Llama-4-Scout-17B-16E Measured: CPU Q6_K 17tok/s vs GPU nvfp4 60tok/s, Cache Strategy and 100K Context Boundary
      • 1T MoE Kimi-K2.5 CPU Inference: Thread Optimization Through Long Context Operations
      • Llama-4-Maverick-17B-128E CPU Inference: Q4_K_M vs Q8_0 Speed-Quality Trade-off Measured
      • Qwen3-Coder-Next 80B in Three Modes: BF16 CPU / IQ4_NL Hybrid / nvfp4 GPU Measured
      • GLM-4.7-Flash IQ5_K Benchmark: CPU vs Hybrid vs Full GPU Performance Comparison
      • Why DeepSeek-V3.2 Appears Slower Than Kimi-K2.5: Prompt Cache Mismatches and TG Bottleneck Analysis
      • Qwen3.5-397B Autonomous Code Generation: From Dental Clinic Sites to Django CMS Foundations
      • shelpa-mcp: Design Record of a Scrapped Virtual Pipeline
      • shelpa: Design and Lessons from a Scrapped Sandbox MCP
      • All Rust, All Handmade -- 9 MCP Tools Powering the Homelab
      • voracle Dev Log vol.2 -- Deploying the Research Pipeline and Overhauling the ONNX Inference Engine
      • From shelpa to filesystem — Complete Redesign of a Rust MCP Filesystem Server
      • voracle — Designing and Implementing a Semantic Search MCP/CLI Tool for Obsidian Vaults
      • Fixing aichat Function Calling Hangs in a Symlink Environment
      • Response Vocabulary Design Swings Small LLM Accuracy by 15 Points: Experiment Log from pathfinder
      • Building an MCP Server to Fix Local LLM Tool Call Failures: pathfinder Design and Benchmarks
      • Building an AST-Based Codebase Analyzer for Local LLM Context: ctree Design and pathfinder/Serena Integration
  • Articles
  • Profile
  • Photos
    Logo
    Contact Us
      • Japanese
    • to navigate
    • to select
    • to close
      • Home
      • Tech Memo
      • Infrastructure
      On this page

      Infrastructure

      Server hardware, network topology, container orchestration, and monitoring stack documentation.

      These articles use AI-generated summaries of Obsidian notes originally kept as technical memos.
      English translations are produced with AI assistance.

      Dagster + NATS JetStream Event Pipeline: Implementation Deep Dive

      Implementation details of the event consumption side where Dagster sensors pull subscribe from NATS …

      Homelab Infrastructure Redesign -- PostgreSQL Storage/Compute Separation and Devstack Overhaul

      Migrating PostgreSQL from an on-demand GPU box to a 24/7 Mac Mini in a 3-node homelab. Covers the …

      Integrating MLflow into devstack — Separating Dagster and Experiment Tracking Responsibilities

      Implementation record of adding MLflow Tracking Server and MinIO to the agent-gateway devstack, …

      Redesigning a 3-Host Homelab: From Promtail Removal to devstack Splitting and Config Consolidation

      A record of migrating from Promtail to Vector, splitting devstack by host, and consolidating Go …

      Running PostgreSQL 18 with pgvector on Rootless Quadlet

      A full operational layout for running PostgreSQL 18 on rootless Podman + Quadlet with LLVM JIT and …

      AMD EPYC 9175F (Turin) Workstation Configuration: HPCT WCE51-GP

      A technical review of the HPCT WCE51-GP workstation built around the AMD EPYC 9175F (Turin), …

      Optimizing Data Infrastructure I/O: NVMe/SATA Tiering and UI Consolidation

      A storage allocation policy based on NVMe/SATA I/O characteristics, combined with a strategy to …

      Optimizing QuteBrowser Configuration for Internal Infrastructure

      A dedicated QuteBrowser configuration for efficiently accessing internal infrastructure (.home.arpa) …

      Using a CRS304 for Local 10GbE and Staged Egress Control

      Using a CRS304 as both the RouterOS router and 10GbE local switch, with VLAN segmentation, …

      State-Driven Syslog Monitoring with MikroTik RouterOS Netwatch

      Implementing Netwatch-based Syslog server liveness monitoring on MikroTik RouterOS with automatic …

      Building a Container Platform with Rootless Podman and Quadlet: UID Mapping, Permission Design, and macOS DNS Resolution

      Complete record of building a container platform with Podman Quadlet on Ubuntu 24.04 (Podman 4.9.x), …

      Building the Storage Server's Always-On Monitoring Stack: Prometheus, Loki, Promtail, and Quadlet

      Complete record of building a Mac mini (Ubuntu 24.04) as an always-on monitoring node running …

      How I Stabilized smartctl-exporter and Standardized Exporter Operations Across Rootful and Rootless Scopes

      Establishing a consistent rootful/rootless placement rule for monitoring exporters while stabilizing …

      Designing and Building a Local Dev Platform on EPYC 9175F and Podman

      Complete design and build record for a local dev platform on three machines: EPYC 9175F + RTX PRO …

      Running Resident Services with Quadlet on Minimal Ubuntu

      A practical guide to managing resident container services on Ubuntu 24.04 minimized using Quadlet …

      How I Structured a Reusable Memory Pool for ELT on a Single EPYC Server

      Designing a 512GB memory pool that rotates step by step across In-Memory, Arrow IPC, and Parquet …

      Rebuilding a Compute Server in 20-30 Minutes with tar.zst and rclone

      A backup and recovery design for a rootless Podman compute server using ext4/XFS, tar.zst, and …

      Containerizing a Local LLM Stack: Docker Compose for vLLM, llama.cpp, and a Rust Proxy

      A secure Docker Compose configuration for running vLLM, llama.cpp, Qdrant, and PostgreSQL under …

      How I Split rclone and rsync When Moving Hugging Face Models from Cold to Hot Storage

      A transfer procedure that splits rclone for blobs and rsync for snapshots/refs when moving Hugging …


      © 2017-2026 loFT LLC