Conclusion

MLflow Tracking Server and MinIO (S3-compatible object storage) were added to the devstack running the Dagster + NATS event pipeline. The design principle is a separation of responsibilities: “Dagster provides the orchestration bird’s-eye view, MLflow provides the detailed view of individual ML experiments,” with both linked by correlation_id.

The final changes span 8 files and 3 new services (MinIO, minio-init, MLflow). The system became operational after working through three real-world issues: port conflict with macOS AirPlay Receiver, missing psycopg2 in the official MLflow image, and adding a database to an existing PostgreSQL volume.


Why MLflow Was Needed

Dagster was already embedded in the devstack as the orchestrator, running a pipeline that pulls pipeline events from NATS JetStream via sensors, executes jobs, and persists results to PostgreSQL. Dagster is sufficient for a bird’s-eye view of “what happened,” but there was no mechanism to track what happened inside individual ML experiments — hyperparameters, metric trends, artifacts.

Two observation layers are explicitly separated:

  • Dagster: Bird’s-eye view (what happened)
  • MLflow: Detailed view (what happened inside)
  • correlation_id links both layers
  • Dagster asset metadata records experiment_id, run_id, tracker_url, summary
  • MLflow run tags include correlation_id

The experiment tracking foundation needed to be in place ahead of Phase 2 fine-tuning pipelines and investment prediction model generation.


Existing Infrastructure

The overall devstack configuration is as follows.

Core services:

ServiceRole
NATS (JetStream)Messaging
PostgreSQL 18 (pgvector + JIT, 4GB SHM)Data store
Dagster 3 containers (webserver + daemon + user-code gRPC)Orchestration
VectorTelemetry collection
FastAPI Reranker (ColBERT)Reranking

Lakehouse profile (optional): Nessie, Trino, dbt-fusion

3-host configuration:

  • Storage server
  • Desktop Mac — NATS co-located with gateway
  • Compute server — Dagster and PostgreSQL

The README listed MinIO (S3-compatible) for binary artifact storage in the data retention strategy, but no MinIO service existed in podman-compose.yml yet. Since MLflow requires an S3-compatible artifact store, MinIO was added simultaneously.


Change Plan

  1. Add MLflow Tracking Server to podman-compose.yml. Backend store reuses the existing PostgreSQL with a new mlflow database; artifact store uses MinIO’s s3://mlflow/
  2. Add CREATE DATABASE mlflow; to init.sql
  3. Add dagster-mlflow resource to the Dagster side, making experiment_name and mlflow_tracking_uri configurable
  4. Add MLFLOW_TRACKING_URI and MLFLOW_S3_ENDPOINT_URL to environment variables
  5. Update README.md

MLflow is dedicated to experiment tracking; orchestration remains with Dagster.


dagster-mlflow API Investigation

The existing Dagster is pinned to 1.12.14. Compatibility with dagster-mlflow was verified.

On PyPI, dagster-mlflow follows Dagster’s version scheme — dagster 1.12.X corresponds to dagster-mlflow 0.28.X. For dagster 1.12.14, dagster-mlflow==0.28.14 is used. Dependencies: dagster==1.12.14, mlflow, pandas<3.0.0, protobuf!=5.29.0.

The source code was read directly after installing in a temporary venv:

  • mlflow_tracking is an old-style @resource decorator ResourceDefinition. Not the ConfigurableResource pattern used by existing PostgresResource and EmbeddingResource
  • Ops access it via required_resource_keys={"mlflow"} and context.resources.mlflow
  • The end_mlflow_on_run_finished hook must be applied to jobs or MLflow runs will hang. This is mandatory
  • The MlflowMeta metaclass proxies all mlflow.* methods, so log_params(), log_metric(), log_artifact() can be called directly on the resource object
  • S3 credentials can be passed via env config, but this is unnecessary if already set as container environment variables

The old-style resource pattern differs somewhat from the existing codebase, but this is unavoidable given dagster-mlflow’s package design.


Implementation

Following the plan, 8 files were changed or newly created.

podman-compose.yml: 3 Services Added

MinIO (docker.io/minio/minio:latest, ports 9000/9001, minio-data volume) was added as S3-compatible object storage. minio-init is a one-shot container that creates the s3://mlflow/ bucket using the mc command.

MLflow Tracking Server (ghcr.io/mlflow/mlflow:v2.21.3, port 5000):

  --backend-store-uri=postgresql://postgres:postgres@postgres:5432/mlflow
--default-artifact-root=s3://mlflow/
  

Connected to PostgreSQL and MinIO as backend/artifact stores. depends_on specifies postgres service_healthy and minio-init service_completed_successfully.

The following environment variables were added to Dagster’s dagster-user-code and dagster-daemon containers:

  MLFLOW_TRACKING_URI: "http://mlflow:5000"
MLFLOW_S3_ENDPOINT_URL: "http://minio:9000"
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
  

init.sql

  CREATE DATABASE mlflow;
  

dagster/pyproject.toml

Added dagster-mlflow==0.28.14 and boto3. boto3 is required for MLflow’s S3 artifact store access.

dagster/project/resources/mlflow.py (new)

  mlflow_tracking.configured({
    "experiment_name": os.getenv("MLFLOW_EXPERIMENT_NAME", "agent-gateway"),
    "mlflow_tracking_uri": os.getenv("MLFLOW_TRACKING_URI", "http://mlflow:5000"),
    "extra_tags": {"project": "agent-gateway"},
})
  

dagster/project/defs.py

Added "mlflow": mlflow_resource to the Definitions resources.

.envrc

Added MLFLOW_TRACKING_URI and MLFLOW_S3_ENDPOINT_URL.


Troubleshooting

Port 5000 Conflict

podman compose up -d mlflow failed to start.

  Error response from daemon: "listen tcp :5000: bind: address already in use"
  

Checking with lsof -i :5000 revealed macOS ControlCenter (AirPlay Receiver) was occupying port 5000.

The MLflow host port mapping was changed to "${MLFLOW_PORT:-5050}:5000". The container continues to run on port 5000 internally, so Dagster containers’ MLFLOW_TRACKING_URI: http://mlflow:5000 requires no change. Only host-side access uses port 5050.

The internal MLFLOW_TRACKING_URI was briefly changed to 5050 as well, but the mistake was quickly caught — inter-container communication should use container ports — and reverted to 5000. .envrc was updated with MLFLOW_PORT="5050", and MLFLOW_TRACKING_URI references "http://${COMPUTE_HOST}:${MLFLOW_PORT}" for external access.

Official Image Missing psycopg2

After the port fix, MLflow crashed immediately on restart.

  ModuleNotFoundError: No module named 'psycopg2'
  

The official ghcr.io/mlflow/mlflow:v2.21.3 image does not include psycopg2. The image assumes SQLite or MySQL as the backend store; the PostgreSQL driver must be installed manually.

A new devstack/mlflow/Dockerfile was created:

  FROM ghcr.io/mlflow/mlflow:v2.21.3
RUN pip install --no-cache-dir psycopg2-binary boto3
  

The mlflow service in podman-compose.yml was changed from a direct image: reference to build: context: ./devstack/mlflow, tagged locally as localhost/agent-gateway/mlflow:2.21.3.

Adding a Database to an Existing Volume

After rebuilding with the custom image, a PostgreSQL-side error appeared.

  FATAL: database "mlflow" does not exist
  

CREATE DATABASE mlflow; had already been added to init.sql, but PostgreSQL’s docker-entrypoint-initdb.d only runs on first startup. With the postgres-data volume having existed for three weeks, nothing added to init.sql would take effect.

The database was created manually:

  podman exec agent-gateway-postgres-1 psql -U postgres -c "CREATE DATABASE mlflow;"
  

The addition to init.sql is preserved for cases where the volume is destroyed and recreated, or for new environment setup.


Result

After MLflow started and Alembic migrations ran automatically, gunicorn began operating with 4 workers.

  [2026-03-12 02:07:58 +0000] [24] [INFO] Starting gunicorn 23.0.0
[2026-03-12 02:07:58 +0000] [24] [INFO] Listening at: http://0.0.0.0:5000 (24)
  

Connectivity was verified via the experiments/search API, confirming the Default experiment was returned. The artifact_location was s3://mlflow/0, correctly referencing MinIO.

Changed Files

FileChange
podman-compose.ymlAdded 3 services: MinIO, minio-init, mlflow + Dagster env vars
devstack/postgres/init.sqlCREATE DATABASE mlflow
devstack/dagster/pyproject.tomlAdded dagster-mlflow==0.28.14, boto3
devstack/dagster/project/resources/mlflow.pyNew: mlflow_tracking configured resource
devstack/dagster/project/defs.pymlflow resource registration
devstack/mlflow/DockerfileNew: psycopg2-binary + boto3
.envrcMLFLOW_PORT, MLFLOW_TRACKING_URI, MLFLOW_S3_ENDPOINT_URL
README.mdService list, environment variables, topology diagram

Design Notes

  • dagster-mlflow is an old-style resource (@resource decorator), not ConfigurableResource. The @end_mlflow_on_run_finished hook must always be applied to jobs
  • Inter-container communication uses internal port 5000; host access uses port 5050 (macOS AirPlay Receiver workaround)
  • When a PostgreSQL volume already exists, new database creation in init.sql must be applied manually