On this page

Rebuilding a Compute Server in 20-30 Minutes with tar.zst and rclone

A backup and recovery design for a rootless Podman compute server using ext4/XFS, tar.zst, and rclone instead of ZFS.

These articles use AI-generated summaries of Obsidian notes originally kept as technical memos.

English translations are produced with AI assistance.

Introduction

Once a compute server starts carrying persistent services under rootless Podman, the real risk is not day-to-day operation. It is recovery. I wanted a rebuild path that stays obvious even after an OS reinstall or an NVMe reset, so I wrote down a backup and restore policy that gets the full environment back without depending on more moving parts than necessary.

The goal here was not to add a sophisticated storage stack. It was to make recovery readable. Instead of ZFS, I kept the design centered on ext4/XFS + tar.zst + rclone and explicitly separated what must be preserved from what can be recreated.

Background and Motivation

My compute server hosts persistent workloads such as Postgres, Trino, and Dagster, but the real environment is larger than /mnt/data. A rootless Podman setup also depends on the user environment, user-systemd, subuid, subgid, and linger. If those assumptions are not restored correctly, data may still exist while the runtime fails to come back cleanly.

That is why this note fixes the recovery boundary first. I wanted to know exactly which files are backup material, which settings are part of the runtime contract, and which pieces should be regenerated instead of archived blindly.

This also aligns with a transfer workflow where I split responsibilities between rclone and rsync: large blob trees move well with rclone copy, while snapshots and refs that depend on symlink fidelity are better preserved with rsync -aH. The same mindset applies here.

Goal

The target outcome is simple: after reinstalling the OS or wiping NVMe storage, I want to reconstruct the rootless Podman environment and its data quickly and predictably.

Keep the recovery path simple with ext4/XFS + rclone + tar.zst
Preserve the prerequisites that matter for rootless Podman, including UID/GID, subuid/subgid, and linger
Separate backup targets from non-targets so the system stays understandable under failure

Directory Layout

I started by defining the system from a recovery perspective.

Path	Role
`/mnt/data`	Persistent application data for Podman apps such as Postgres, Trino, Dagster, etc.
`/opt/{app}`	Configuration, compose files, and systemd definitions
`/usr/local`	Local configuration for Trino, dbt, Dagster, and related tools
`/etc`	System configuration such as networking, systemd, and Podman settings
`/home/ksh3`	Rootless Podman user environment and user-systemd settings
`/srv/backup`	Backup staging area for compressed archives and package metadata

The important point is that I am not treating this as data-only recovery. /home/ksh3 and /etc are part of the runtime contract. Leaving them out would preserve files while still losing the conditions that let rootless Podman start correctly.

Backup Policy

System Configuration Backup (Daily)

The daily backup captures the baseline required to rebuild the host after a reinstall. The targets are /usr/local, /etc, /home/ksh3, /srv/backup/pkglist.txt, and /etc/apt/sources.list.d/.

  sudo dpkg --get-selections | awk '!/deinstall|purge/ {print $1}' \
  > /srv/backup/pkglist.txt

tar --use-compress-program="zstd -T0 -19" \
    -cf /srv/backup/system-$(date +%Y-%m-%d_%H-%M).tar.zst \
    /etc /usr/local /home/ksh3 \
    /srv/backup/pkglist.txt /etc/apt/sources.list.d \
    --exclude='/home/ksh3/.cache' --exclude='/home/ksh3/.local/share/Trash'

I include pkglist.txt so package recovery is anchored to a concrete inventory rather than memory. I exclude cache and Trash because they are easy to regenerate and only make the archive noisier and larger.

The zstd -T0 -19 choice is deliberate. This archive is configuration-heavy and relatively bounded in size, so I can afford stronger compression while still using parallel CPU threads.

Data Backup (Weekly or Manual)

The persistent workload data under /mnt/data is backed up separately, either weekly or on demand.

  tar --use-compress-program="zstd -T0 -5" \
    -cf /srv/backup/mnt-data-$(date +%Y-%m-%d).tar.zst -C /mnt data

Here I lower compression to -5 because the main concern is turnaround time. For large data trees, perfect compression is less valuable than producing a usable archive quickly.

Keeping the data archive separate from the system archive also makes the restore sequence cleaner. I can recover the OS-level configuration first and then bring the application data back in a controlled order.

Transfer Target (storage-server)

Once the archives are created, I send them to the storage server.

  rclone copy /srv/backup/*.tar.zst storage:/srv/backups/compute/ --progress

I intentionally use plain rclone copy here because I am transferring archive files, not live directory trees with symlink semantics that must be preserved.

Logs are excluded because they are already centralized in Prometheus and Loki. If observability data already lives elsewhere, keeping it out of the host backup makes the recovery set easier to reason about.

Recovery Procedure (After Reinstall)

Minimal Package Setup

Right after reinstalling the OS, I only need the tooling required to fetch archives, unpack them, and start Podman again.

  sudo apt update
sudo apt install -y zstd rclone podman podman-compose

That establishes the minimum base for decompression, transfer, and container startup.

Retrieve the Backups

Next I pull the backup set from the storage server and extract both archives.

  rclone copy storage:/srv/backups/compute/latest/ /tmp/restore/
sudo tar -I zstd -xf /tmp/restore/system-YYYY-MM-DD.tar.zst -C /
sudo tar -I zstd -xf /tmp/restore/mnt-data-YYYY-MM-DD.tar.zst -C /

The latest path is useful as an operational entry point, although it still needs an explicit retention and aliasing policy. Even so, having a stable “current restore source” already removes ambiguity during recovery.

Restore User and Permissions

This is the most critical section for a rootless Podman rebuild. User identity, ownership, namespace settings, and linger all need to line up.

  sudo useradd -m -u 1000 -s /bin/bash ksh3
sudo chown -R ksh3:ksh3 /home/ksh3 /mnt/data /opt /usr/local
echo "ksh3:100000:65536" | sudo tee /etc/subuid /etc/subgid
loginctl enable-linger ksh3

I keep the UID fixed at 1000 because persistent data under rootless Podman becomes much harder to reuse cleanly if the identity shifts. This is one of those details that looks minor until it becomes the reason recovery drags on.

Rebuild the Podman Environment

Once the prerequisites are back in place, I can restore the runtime itself.

  sudo -u ksh3 podman system migrate
sudo -u ksh3 systemctl --user daemon-reload
sudo -u ksh3 systemctl --user enable --now pod-*.service
# Or:
cd /opt/containers/compose
for d in *; do [ -d "$d" ] && cd "$d" && podman-compose up -d && cd ..; done

I kept both recovery paths because they are useful in different situations. Sometimes I want user-systemd to reassert the intended steady state. Other times I just want to bring containers back directly with podman-compose up -d. In either case, the earlier UID/GID and namespace restore steps are what make this stage reliable.

Additional Notes

I also wrote down the small but easy-to-forget rules that matter during recovery.

Item	Detail
Fixed UID/GID	Rootless Podman persistent data requires the same UID (`1000`)
`/etc/subuid` / `/etc/subgid`	Required for the rootless namespace and must be backed up and restored
`loginctl enable-linger`	Re-enables automatic user-systemd startup
Networking	No backup required because Podman recreates it during rebuild
SSH keys	Excluded for security and regenerated separately

Explicitly stating that networking is not part of the backup is useful. Trying to preserve everything tends to make recovery design brittle. If a component is safe to recreate, I would rather document that decision and keep the backup smaller and cleaner.

SSH keys follow the same logic. It is more secure to keep them outside this host-recovery bundle and regenerate or restore them through a separate security process.

Result

The final policy is straightforward.

Backup targets: /usr/local, /etc, /home/ksh3, /srv/backup/pkglist.txt, /etc/apt/sources.list.d, and /mnt/data
Format: tar.zst + rclone
Sensitive key material stays excluded
Recovery requires matching UID/GID, restored subuid/subgid, and enabled linger
After restoration, the environment comes back through /opt/containers/compose with podman-compose up or user-systemd activation

With that in place, a full rebuild should fit into roughly 20 to 30 minutes. The more important win is that the recovery path is explicit instead of relying on memory.

Future Work

There are still a few things worth tightening.

Define how storage:/srv/backups/compute/latest/ is maintained so retention and archive selection are deterministic
Connect this backup policy more explicitly to the separate rclone / rsync transport rules for symlink-heavy assets and large model trees
Add a post-restore smoke test so the Podman services can be validated automatically

As a baseline recovery design, though, this is already in a good place. The next step is less about adding complexity and more about making the policy operationally tighter.

How I Structured a Reusable Memory Pool for ELT on a Single EPYC Server

Designing a 512GB memory pool …

Containerizing a Local LLM Stack: Docker Compose for vLLM, llama.cpp, and a Rust Proxy

A secure Docker Compose …