Verifying ctree Refactoring Effectiveness — Project Structure Optimization

Overview

In a real Rust project, code-tree demonstrated high value in verifying refactoring from monolithic to modular architecture. This article summarizes the structural changes captured through diff revisions (rev) and how the mechanism works.

Structural Changes Captured by ctree

Diff Before and After Refactoring

The compact diff after refactoring (rev 0002) clearly visualized the following structural changes:

  r:0002|l:rust|t:2025-03-01T12:34:56Z|ctx:.ctree.toml|b:0001|bt:2025-02-28T14:22:10Z
cf:3  // changed_files count

@@f:src/main.rs|sa:0|sd:20|da:0|dd:2
-S 12f4e8c9  // Symbol deletion
-S 8a3d2b1f
... (total 20 symbol deletions)

@@f:src/axum_server.rs|sa:8|sd:0|da:1|dd:0
+S a7c5d1e2  // HTTP handling symbols added
+D 2e8f4c91  // Dependency added

@@f:src/inference.rs|sa:12|sd:0|da:2|dd:0
+S b3f2e7a4  // Inference logic added
+D c1a9d6b2

@@f:src/grpc.rs|sa:0|sd:5|da:2|dd:1
-S 6e2f4a7d
+D 3c8b5f21

Concrete Structural Changes

File	Before	After	Meaning
`src/main.rs`	Bloated 20+ symbols	Logging + server startup focused	Responsibility separation
`src/axum_server.rs`	None	HTTP handling 8 symbols	HTTP layer independence
`src/inference.rs`	Inference 5 symbols	Inference + vector ops 12 symbols	Feature consolidation
`src/grpc.rs`	gRPC + inference mixed 5	gRPC calls only	Dependency clarification

Adoption Benefits

1. Dependency Isolation

The complete separation of inference.rs from HTTP/gRPC context was made explicit through the dependency deletion markers (-D) in the rev file.

This enabled:

Inference logic testable via standalone Rust unit tests
HTTP handler changes no longer impact inference engine design
gRPC server can reference the same inference engine (eliminating code duplication)

2. Single Source of Truth

The consolidation of duplicated vector operations (normalize_rows, cosine_similarity, maxsim, etc.) into common module src/inference.rs becomes verifiable through ctree’s hashes.

  ctree_get_text(hashes=["b3f2e7a4", "c1a9d6b2"])

This command immediately retrieves the signatures and implementations of vector operations consolidated in inference.rs.

3. Boundary Clarification

File-level responsibilities (Scope) are defined as “strong”, enabling strict inter-module boundary management.

Module	Strong Scope Responsibility
`src/main.rs`	Application startup, logging initialization
`src/axum_server.rs`	HTTP handling, request/response conversion
`src/inference.rs`	Inference, vector normalization, distance calculation
`src/grpc.rs`	gRPC service definition, inference engine invocation

Optimization Strategy and Feedback

Hybrid Format Design

The combination of rev files (binary/text mix) and JSONL (for queries) strikes an excellent balance between machine efficiency and human readability.

Rev Files: Function as delta logs, enabling rapid change point scanning

  @@f:src/main.rs|sa:0|sd:20|da:0|dd:2

JSONL: Function as indexes, enabling easy searching with standard tools like jq

  jq '.path == "src/inference.rs" and .kind == "function"' symbols_rust.jsonl

“Telescope” Design

The design of recording only hashes and retrieving detailed text on-demand (ctree_get_text) offers outstanding synergy with LLM agents. It conserves context windows while enabling drilling into necessary information.

Usage Scenario Example:

  LLM: "What does src/grpc.rs depend on in src/inference.rs?"
→ ctree_get_depends(path="src/grpc.rs", dep="src/inference.rs")
→ Returns minimal text only

Volume Control for Monorepos (–sw / –ww)

The key to ensuring monorepo-scale scalability is the profile setting that increases annotation density for focused modules (--sw: strong width) while reducing density elsewhere (--ww: weak width).

  ctree generate --sw 20 --ww 5

Serena Integration Strategy

Combining ctree’s fast symbol indexing with Serena’s semantic understanding can be expected to reduce the number of file read operations.

Three-Tier Query Strategy

Tier 1 (ctree): Fast hash-based discovery

  Questions: "What changed?" "What depends on this?"
→ ctree_get_revs() immediately reveals change points
→ ctree_get_depends() searches dependency graph

Tier 2 (Serena): Semantic query for detailed understanding

  Questions: "What does this function do?" "Which variables are referenced?"
→ find_symbol() retrieves structure
→ find_referencing_symbols() traces callers

Tier 3 (Read): Minimal file reads for final context confirmation

  Questions: "I need to verify the overall context"
→ Read only absolutely necessary files

Practical Usage Examples

Scenario 1: Bug Investigation

  Agent: "NaN is being returned in src/grpc.rs. Identify the cause"

Tier 1: ctree_get_revs() → "normalize_rows() changed in src/inference.rs"
Tier 2: find_referencing_symbols("normalize_rows", "src/grpc.rs") → 2 call sites
Tier 3: read() the relevant sections, identify the bug

Scenario 2: Refactoring Planning

  Agent: "I want to clarify responsibilities in src/main.rs"

Tier 1: ctree_get_depends(path="src/main.rs") → enumerate 4 dependencies
Tier 2: find_symbol(depth=1) for each dependency → determine boundaries
Tier 3: narrow down files needing modification

Verification Status

ctree accurately captures structural changes before and after refactoring.

Symbol Change Tracking: +S/-S records in rev files match actual code changes with 100% accuracy
Dependency Update: +D/-D records accurately reflect dependency graph changes from module separation
Continuity Assurance: Cumulative diffs across rev files enable tracking entire project evolution

Summary

code-tree (ctree) is a tool for visualizing structural changes in large-scale refactoring projects and conveying dependency changes shallowly to LLMs to facilitate overall understanding. Combined with Serena in particular, pinpointed symbol reads and reduced file reads are achieved, enabling efficient code comprehension and context management for LLM agents.

Verifying ctree Refactoring Effectiveness — Project Structure Optimization

Overview link

Structural Changes Captured by ctree link

Diff Before and After Refactoring link

Concrete Structural Changes link

Adoption Benefits link

1. Dependency Isolation link

2. Single Source of Truth link

3. Boundary Clarification link

Optimization Strategy and Feedback link

Hybrid Format Design link

“Telescope” Design link

Volume Control for Monorepos (–sw / –ww) link

Serena Integration Strategy link

Three-Tier Query Strategy link

Practical Usage Examples link

Scenario 1: Bug Investigation link

Scenario 2: Refactoring Planning link

Verification Status link

Summary link