Overview

In a real Rust project, code-tree demonstrated high value in verifying refactoring from monolithic to modular architecture. This article summarizes the structural changes captured through diff revisions (rev) and how the mechanism works.

Structural Changes Captured by ctree

Diff Before and After Refactoring

The compact diff after refactoring (rev 0002) clearly visualized the following structural changes:

  r:0002|l:rust|t:2025-03-01T12:34:56Z|ctx:.ctree.toml|b:0001|bt:2025-02-28T14:22:10Z
cf:3  // changed_files count

@@f:src/main.rs|sa:0|sd:20|da:0|dd:2
-S 12f4e8c9  // Symbol deletion
-S 8a3d2b1f
... (total 20 symbol deletions)

@@f:src/axum_server.rs|sa:8|sd:0|da:1|dd:0
+S a7c5d1e2  // HTTP handling symbols added
+D 2e8f4c91  // Dependency added

@@f:src/inference.rs|sa:12|sd:0|da:2|dd:0
+S b3f2e7a4  // Inference logic added
+D c1a9d6b2

@@f:src/grpc.rs|sa:0|sd:5|da:2|dd:1
-S 6e2f4a7d
+D 3c8b5f21
  

Concrete Structural Changes

FileBeforeAfterMeaning
src/main.rsBloated 20+ symbolsLogging + server startup focusedResponsibility separation
src/axum_server.rsNoneHTTP handling 8 symbolsHTTP layer independence
src/inference.rsInference 5 symbolsInference + vector ops 12 symbolsFeature consolidation
src/grpc.rsgRPC + inference mixed 5gRPC calls onlyDependency clarification

Adoption Benefits

1. Dependency Isolation

The complete separation of inference.rs from HTTP/gRPC context was made explicit through the dependency deletion markers (-D) in the rev file.

This enabled:

  • Inference logic testable via standalone Rust unit tests
  • HTTP handler changes no longer impact inference engine design
  • gRPC server can reference the same inference engine (eliminating code duplication)

2. Single Source of Truth

The consolidation of duplicated vector operations (normalize_rows, cosine_similarity, maxsim, etc.) into common module src/inference.rs becomes verifiable through ctree’s hashes.

  ctree_get_text(hashes=["b3f2e7a4", "c1a9d6b2"])
  

This command immediately retrieves the signatures and implementations of vector operations consolidated in inference.rs.

3. Boundary Clarification

File-level responsibilities (Scope) are defined as “strong”, enabling strict inter-module boundary management.

ModuleStrong Scope Responsibility
src/main.rsApplication startup, logging initialization
src/axum_server.rsHTTP handling, request/response conversion
src/inference.rsInference, vector normalization, distance calculation
src/grpc.rsgRPC service definition, inference engine invocation

Optimization Strategy and Feedback

Hybrid Format Design

The combination of rev files (binary/text mix) and JSONL (for queries) strikes an excellent balance between machine efficiency and human readability.

Rev Files: Function as delta logs, enabling rapid change point scanning

  @@f:src/main.rs|sa:0|sd:20|da:0|dd:2
  

JSONL: Function as indexes, enabling easy searching with standard tools like jq

  jq '.path == "src/inference.rs" and .kind == "function"' symbols_rust.jsonl
  

“Telescope” Design

The design of recording only hashes and retrieving detailed text on-demand (ctree_get_text) offers outstanding synergy with LLM agents. It conserves context windows while enabling drilling into necessary information.

Usage Scenario Example:

  LLM: "What does src/grpc.rs depend on in src/inference.rs?"
→ ctree_get_depends(path="src/grpc.rs", dep="src/inference.rs")
→ Returns minimal text only
  

Volume Control for Monorepos (–sw / –ww)

The key to ensuring monorepo-scale scalability is the profile setting that increases annotation density for focused modules (--sw: strong width) while reducing density elsewhere (--ww: weak width).

  ctree generate --sw 20 --ww 5
  

Serena Integration Strategy

Combining ctree’s fast symbol indexing with Serena’s semantic understanding can be expected to reduce the number of file read operations.

Three-Tier Query Strategy

Tier 1 (ctree): Fast hash-based discovery

  Questions: "What changed?" "What depends on this?"
→ ctree_get_revs() immediately reveals change points
→ ctree_get_depends() searches dependency graph
  

Tier 2 (Serena): Semantic query for detailed understanding

  Questions: "What does this function do?" "Which variables are referenced?"
→ find_symbol() retrieves structure
→ find_referencing_symbols() traces callers
  

Tier 3 (Read): Minimal file reads for final context confirmation

  Questions: "I need to verify the overall context"
→ Read only absolutely necessary files
  

Practical Usage Examples

Scenario 1: Bug Investigation

  Agent: "NaN is being returned in src/grpc.rs. Identify the cause"

Tier 1: ctree_get_revs() → "normalize_rows() changed in src/inference.rs"
Tier 2: find_referencing_symbols("normalize_rows", "src/grpc.rs") → 2 call sites
Tier 3: read() the relevant sections, identify the bug
  

Scenario 2: Refactoring Planning

  Agent: "I want to clarify responsibilities in src/main.rs"

Tier 1: ctree_get_depends(path="src/main.rs") → enumerate 4 dependencies
Tier 2: find_symbol(depth=1) for each dependency → determine boundaries
Tier 3: narrow down files needing modification
  

Verification Status

ctree accurately captures structural changes before and after refactoring.

  • Symbol Change Tracking: +S/-S records in rev files match actual code changes with 100% accuracy
  • Dependency Update: +D/-D records accurately reflect dependency graph changes from module separation
  • Continuity Assurance: Cumulative diffs across rev files enable tracking entire project evolution

Summary

code-tree (ctree) is a tool for visualizing structural changes in large-scale refactoring projects and conveying dependency changes shallowly to LLMs to facilitate overall understanding. Combined with Serena in particular, pinpointed symbol reads and reduced file reads are achieved, enabling efficient code comprehension and context management for LLM agents.