How I Split rclone and rsync When Moving Hugging Face Models from Cold to Hot Storage

Introduction

When I move a Hugging Face model from a cold archive path to a hot runtime path, a full directory copy is not the cleanest approach. The repository layout mixes large file objects with symlink-based revision structure, so using one tool for everything tends to blur an important distinction.

This note captures the split I settled on for a GLM-5-GGUF repository: rclone for blobs, and rsync for snapshots and refs. In my storage strategy, active data and slower archival storage are treated differently; this transfer pattern is the concrete way I move a model into the hot tier while keeping the Hugging Face directory structure usable.

Background and Motivation

A local Hugging Face hub directory is more structured than it first appears. Most of the bytes live under blobs, but actual revision resolution depends on the trees under snapshots and refs.

If I treat all of that as the same kind of data, I either lose performance on the large object copy or risk mangling the symlink structure that makes the repository usable. The simplest answer was to split the transfer by data type: move file objects with a tool that is good at parallel copying, and move the reference tree with a tool that preserves filesystem structure.

Prerequisites

I start by fixing the model-specific paths.

  BASE="models--unsloth--GLM-5-GGUF"
SRC_BASE="/srv/archive/cold/hf/hub/$BASE"
DST_BASE="/srv/archive/hot/hf/hub/$BASE"
mkdir -p "$DST_BASE"

With this layout, /srv/archive/cold/hf/hub/ is the archive side and /srv/archive/hot/hf/hub/ is the active side. For another model, the main change is just BASE.

Copying blobs in Parallel with rclone

The first step is to move the actual file objects under blobs.

  rclone copy "$SRC_BASE/blobs" "$DST_BASE/blobs" \
  --exclude "*.incomplete" \
  --transfers 16 --checkers 32 \
  --local-no-check-updated \
  -P

This is where rclone makes sense. blobs is just a large object store, so parallel transfer is the main concern. The starting point is --transfers 16 --checkers 32, with room to push to 32/64 on 10GbE plus SSD.

I also exclude .incomplete files from the start. If the destination is meant to be a clean hot-side copy, unfinished objects should not come along for the ride.

Copying snapshots While Keeping Symlinks Intact

Next, I copy snapshots.

  mkdir -p "$DST_BASE/snapshots"
rsync -aH --info=progress2 \
  --exclude="*.incomplete" \
  "$SRC_BASE/snapshots/" "$DST_BASE/snapshots/"

Here I switch to rsync -aH because the goal is no longer raw throughput alone. The point is to preserve the symlink-based structure as-is. In a Hugging Face hub layout, snapshots references objects under blobs, so keeping those links intact matters more than forcing everything through the same copy tool.

--info=progress2 is also useful here because snapshot trees can still take time, and I want a single progress view while the copy is running.

Copying refs for Repository Consistency

refs is small, but I still move it explicitly.

  mkdir -p "$DST_BASE/refs"
rsync -aH --info=progress2 \
  "$SRC_BASE/refs/" "$DST_BASE/refs/"

This directory does not dominate transfer time, but it is part of the repository state. Leaving it behind would make the hot-side copy feel incomplete even if the large files and snapshot tree are already there.

Running a Minimal Integrity Check

After the copy, I run one quick validation step.

  find "$DST_BASE/snapshots" -type l ! -exec test -e {} \; -print | head

If the command prints nothing, there are no broken symlinks detected under snapshots. It is not a full audit, but it is a practical first check and it directly validates the part of the transfer that is easiest to break when mixing tools or copy modes.

For me, this one line is what turns the procedure from “copied some directories” into “copied the repository structure and confirmed it still resolves.”

Copying Only a Specific Revision

Sometimes I do not want the whole snapshot set. In that case, I can narrow the copy to one revision and one quantization subtree.

  REV="acc91597d28b7ebd3a8c20fd5331ceaf07a4ece1"
mkdir -p "$DST_BASE/snapshots/$REV"
rsync -aH --info=progress2 \
  "$SRC_BASE/snapshots/$REV/IQ4_NL/" "$DST_BASE/snapshots/$REV/IQ4_NL/"

This is useful when I only need a specific variant such as IQ4_NL on the hot side. For larger repositories, that kind of narrowing is often the difference between a short operational copy and an unnecessary full promotion.

The tradeoff is operational clarity: once partial copies are allowed, I need a clear rule for which revisions and quantizations are expected to live on the hot tier.

Reusing the Pattern for Other Models

The same structure can be reused for another model such as DeepSeek-V3.2-Speciale by changing BASE=.

That is the part worth keeping as a template. The stable idea is not model-specific naming; it is the division of labor:

Use rclone for object-heavy blobs
Use rsync for symlink and reference trees under snapshots and refs

As long as that split stays intact, the Hugging Face repository layout remains usable after the move.

Results

The procedure clarifies a few things:

blobs should be treated as parallel file transfer work
snapshots and refs should be treated as structure-preservation work
.incomplete files should be excluded from the hot-side copy
a broken-symlink check should be part of the workflow
partial promotion by revision or quantization is possible when needed

That gives me a transfer method that is fast enough for the large files and conservative enough for the repository structure.

Future Work

The next improvement is not another copy command. It is tightening the operational guardrails around this one.

Add post-transfer file-count or size verification for blobs
Record model-size-based presets for --transfers and --checkers
Define a separate policy for which revisions belong on hot storage

Once cold and hot storage have distinct roles, the procedure should define not only how I copy a model, but also what deserves promotion in the first place.