code-tree Specification, Design Intent, and Expected Effects — LLM Context Optimization Tool

Overview

code-tree is a Rust-based tool that traverses codebases to generate “structured summary contexts”. The generated data is stored in the .ctree/ directory and organized in LLM-friendly formats including tree structures, symbols, dependencies, and diff information.

It is available as a CLI tool (ctree) and MCP server (ctree-mcp), enabling direct integration with editors and agents.

Design Intent

Initial Design: Starting from Programming Languages

code-tree was originally designed as a static analysis tool for extracting symbols from programming languages like Rust, Go, and Python. However, real-world operation revealed critical gaps:

Document Format Handling: In Hugo-based blogs or documentation projects, Markdown content comprises the majority of the project. Excluding it from context results in incomplete project understanding
Template Structure: Changes to HTML templates (Hugo/Jinja) weren’t reflected in the context

This experience led to extending code-tree to support Markdown and HTML templates. See “Building code-tree HTML Template and Markdown Scanner” for details.

The Need for Context Compression

For large-scale projects, the naive approach of reading all source code each time is impractical.

Token Cost Growth: Large repositories must have information structured in minimal, reproducible form before being passed to LLMs
Token Waste: Instead of sending full scan results each time, leverage diff revisions (snapshots/rev/*.txt) and hash references for efficiency
Operational Consistency: Provide unified tool ecosystems that enable reliable ctree_check-centric workflows across editors and agents

Controlling Noise and Variance

In team development environments, output reproducibility is critical.

Fixed generation parameters like sw (strong width) and ww (weak width) suppress output variance
Hash-based references allow context window savings while drilling down to required information

Primary Specifications

Supported Languages

The architecture features modularized language support, enabling incremental addition of language support as needed. Currently, implementation progresses from languages actually used in development.

Planned language support: Go, Rust, Python, TypeScript, JavaScript, C#, Dart, Lua, Awk, Shell (sh/bash/zsh), Kotlin, Swift, Markdown

Each language is modularized, enabling easy extension through pattern definition and scan logic additions.

Scope Classification

Files are classified into 3 tiers with priority ordering (strong > weak > symbol_only):

strong: Generates detailed summaries. Covers function definitions, class definitions, module structures—symbols central to overall project architecture
weak: Generates simplified summaries. Covers utility functions, helper methods, config files—reference-level symbols
symbol_only: Extracts existence information and primary symbols only. Covers dependency packages, test code, generated code

Output Structure (`.ctree/`)

File	Purpose
`tree_annotated_<lang>.txt`	Directory tree with annotations
`symbols_<lang>.jsonl`	Extracted symbols list (JSON Lines format)
`depends_<lang>.jsonl`	File and module dependencies
`text_store_<lang>.jsonl`	Text store for hash references
`strong_summary_<lang>.txt`	Text summary of strong scope only
`weak_summary_<lang>.txt`	Text summary including weak scope
`ctree_ctx_<lang>.txt`	Final context optimized for LLM input

Diff Management and History

Compact diffs are stored as 0001.txt format in .ctree/snapshots/rev/.

Symbols are hashed based on path|kind|name and continuously tracked
Each revision records “additions (+S)”, “deletions (-S)”, “dependency additions (+D)”, “deletions (-D)”
Change points are searchable in O(k log n) without merge joins (k: changes, n: total symbols)

Configuration Management

Supports configuration via .ctree.toml. Priority order is: “CLI arguments > .ctree.toml > defaults”.

  [ctree]
strong = ["src/main.rs", "src/lib.rs", "src/*/mod.rs"]
weak = ["src/**/*.rs", "tests/**/*.rs"]
symbol_only = ["vendor/**/*"]

[output]
template = "markdown"  # Unified Markdown format

Output is unified in Markdown format, enabling direct viewing and editing in Obsidian and other tools.

MCP Tool Suite

Tool	Function
`ctree_init`	Create configuration template
`ctree_generate`	Generate/update context
`ctree_reset`	Reset history and regenerate
`ctree_check`	Combined generation, diff checking, and text retrieval
`ctree_get_baseline`	Retrieve baseline
`ctree_get_revs`	Fetch diff history
`ctree_get_text`	On-demand retrieval of specific text by hash

Expected Adoption Benefits

Token Consumption Optimization (Design Expectation)

By passing only minimum indexed information, we expect improvements in both cost and processing speed.

Initial context: Compress entire project skeleton with .ctree/ctree_ctx_<lang>.txt
Detailed information: Retrieve only required symbols on-demand via hash reference
Diff information: Pass only changes since previous snapshot

Verification Status: When running diff checks in new sessions, diff-based work intent understanding has proven accurate. Detailed quantitative verification of token cost reduction is planned for future work.

Reduced Synchronization Cost

Enables “follow latest only” workflows, eliminating the need to resend context per conversation turn.

Immediately understand “what changed” from rev file diffs
Specify related symbol hashes to dynamically fetch detailed information

Improved Visualization and Review Efficiency

Structural changes become easier to understand, with investigation starting points becoming clear.

Visualize project structure across .tree_annotated_<lang>.txt
Index all symbols in symbols_<lang>.jsonl for fast grep/jq searching

Recommended Operational Flow

Step 1: Initialization

  ctree_init --config .ctree.toml

Initialize project-specific configuration (strong/weak scope definitions).

Step 2: Baseline Generation

  ctree generate --config .ctree.toml

Generate initial context into .ctree/ directory. Record this state as revision 0000.

Step 3: Daily Operations

  ctree_check --config .ctree.toml

ctree_check performs these operations in one call:

Detect source code updates
Regenerate .ctree/
Generate new rev file (0001.txt, etc.)
Output final context for LLM

Step 4: LLM Integration

Provide the following as baseline context to the LLM:

  # Project Structure
<Contents of ctree_ctx_<lang>.txt>

# Recent Changes (rev 0010)
<Contents of snapshots/rev/0010.txt>

When detailed information is needed, specify hashes:

  MCP call: ctree_get_text(hashes=["abc123", "def456"])

Design Philosophy

Rather than always producing “heavy full text” output, the design emphasizes “lightweight index + on-demand retrieval”, maximizing the effectiveness of LLM integration in large-scale development.

“Telescope” Design

The design of recording only hashes and fetching detailed text on demand offers excellent synergy with LLM agents.

Low Magnification (Overview): Examine project structure via tree_annotated_<lang>.txt
Medium Magnification (Module Unit): Integrate symbols_<lang>.jsonl with Serena, enabling drill-down from symbol search to structural understanding
High Magnification (Code Details): Reference specific symbol implementation via text_store_<lang>.jsonl

Volume Control for Monorepos

In DDD + Clean Architecture structured projects (e.g., features/{module}/{domain,infra,presentation}), effective volume control increases annotation density for the active module (–sw: strong width) while reducing it for dependencies (–ww: weak width).

Configuration Example:

  # Active module (detailed summary)
strong = ["features/payment/**/*.rs"]

# Dependencies (simplified)
weak = ["features/*/domain/**/*.rs", "features/*/infra/**/*.rs"]

# Tests and generated code (symbols only)
symbol_only = ["tests/**/*.rs"]

This enables dynamic context volume control even in monorepo scale:

  ctree generate --config .ctree.toml --sw 20 --ww 5

The active module (e.g., payment) includes detailed dependencies and implementations, while other modules’ domain/infra layers provide only structural overviews, reducing token costs while maintaining necessary information.

Current Implementation Status

Language Support: Modularized design enables incremental language implementation as needed
Diff Management: Diff tracking using rev file format is complete. Confirmed that work intent understanding in new sessions is accurate
Token Cost Reduction: While the design is theoretically sound, quantitative verification is planned for future work

Summary

code-tree is a tool that improves LLM agent effectiveness from the “context compression” perspective. It enables lightweight, reproducible context management even for large projects, simultaneously reducing token costs and improving development efficiency.

code-tree Specification, Design Intent, and Expected Effects — LLM Context Optimization Tool

Overview link

Design Intent link

Initial Design: Starting from Programming Languages link

The Need for Context Compression link

Controlling Noise and Variance link

Primary Specifications link

Supported Languages link

Scope Classification link

Output Structure (.ctree/) link

Diff Management and History link

Configuration Management link

MCP Tool Suite link

Expected Adoption Benefits link

Token Consumption Optimization (Design Expectation) link

Reduced Synchronization Cost link

Improved Visualization and Review Efficiency link

Recommended Operational Flow link

Step 1: Initialization link

Step 2: Baseline Generation link

Step 3: Daily Operations link

Step 4: LLM Integration link

Design Philosophy link

“Telescope” Design link

Volume Control for Monorepos link

Current Implementation Status link

Summary link