code-tree Specification, Design Intent, and Expected Effects — LLM Context Optimization Tool
code-tree architecture and tool specifications, context compression and token cost reduction implementation strategy, operational flow through MCP integration
Overview
code-tree is a Rust-based tool that traverses codebases to generate “structured summary contexts”. The generated data is stored in the .ctree/ directory and organized in LLM-friendly formats including tree structures, symbols, dependencies, and diff information.
It is available as a CLI tool (ctree) and MCP server (ctree-mcp), enabling direct integration with editors and agents.
Design Intent
Initial Design: Starting from Programming Languages
code-tree was originally designed as a static analysis tool for extracting symbols from programming languages like Rust, Go, and Python. However, real-world operation revealed critical gaps:
- Document Format Handling: In Hugo-based blogs or documentation projects, Markdown content comprises the majority of the project. Excluding it from context results in incomplete project understanding
- Template Structure: Changes to HTML templates (Hugo/Jinja) weren’t reflected in the context
This experience led to extending code-tree to support Markdown and HTML templates. See “Building code-tree HTML Template and Markdown Scanner” for details.
The Need for Context Compression
For large-scale projects, the naive approach of reading all source code each time is impractical.
- Token Cost Growth: Large repositories must have information structured in minimal, reproducible form before being passed to LLMs
- Token Waste: Instead of sending full scan results each time, leverage diff revisions (
snapshots/rev/*.txt) and hash references for efficiency - Operational Consistency: Provide unified tool ecosystems that enable reliable
ctree_check-centric workflows across editors and agents
Controlling Noise and Variance
In team development environments, output reproducibility is critical.
- Fixed generation parameters like
sw(strong width) andww(weak width) suppress output variance - Hash-based references allow context window savings while drilling down to required information
Primary Specifications
Supported Languages
The architecture features modularized language support, enabling incremental addition of language support as needed. Currently, implementation progresses from languages actually used in development.
Planned language support: Go, Rust, Python, TypeScript, JavaScript, C#, Dart, Lua, Awk, Shell (sh/bash/zsh), Kotlin, Swift, Markdown
Each language is modularized, enabling easy extension through pattern definition and scan logic additions.
Scope Classification
Files are classified into 3 tiers with priority ordering (strong > weak > symbol_only):
- strong: Generates detailed summaries. Covers function definitions, class definitions, module structures—symbols central to overall project architecture
- weak: Generates simplified summaries. Covers utility functions, helper methods, config files—reference-level symbols
- symbol_only: Extracts existence information and primary symbols only. Covers dependency packages, test code, generated code
Output Structure (.ctree/)
| File | Purpose |
|---|---|
tree_annotated_<lang>.txt | Directory tree with annotations |
symbols_<lang>.jsonl | Extracted symbols list (JSON Lines format) |
depends_<lang>.jsonl | File and module dependencies |
text_store_<lang>.jsonl | Text store for hash references |
strong_summary_<lang>.txt | Text summary of strong scope only |
weak_summary_<lang>.txt | Text summary including weak scope |
ctree_ctx_<lang>.txt | Final context optimized for LLM input |
Diff Management and History
Compact diffs are stored as 0001.txt format in .ctree/snapshots/rev/.
- Symbols are hashed based on
path|kind|nameand continuously tracked - Each revision records “additions (+S)”, “deletions (-S)”, “dependency additions (+D)”, “deletions (-D)”
- Change points are searchable in O(k log n) without merge joins (k: changes, n: total symbols)
Configuration Management
Supports configuration via .ctree.toml. Priority order is: “CLI arguments > .ctree.toml > defaults”.
[ctree]
strong = ["src/main.rs", "src/lib.rs", "src/*/mod.rs"]
weak = ["src/**/*.rs", "tests/**/*.rs"]
symbol_only = ["vendor/**/*"]
[output]
template = "markdown" # Unified Markdown format
Output is unified in Markdown format, enabling direct viewing and editing in Obsidian and other tools.
MCP Tool Suite
| Tool | Function |
|---|---|
ctree_init | Create configuration template |
ctree_generate | Generate/update context |
ctree_reset | Reset history and regenerate |
ctree_check | Combined generation, diff checking, and text retrieval |
ctree_get_baseline | Retrieve baseline |
ctree_get_revs | Fetch diff history |
ctree_get_text | On-demand retrieval of specific text by hash |
Expected Adoption Benefits
Token Consumption Optimization (Design Expectation)
By passing only minimum indexed information, we expect improvements in both cost and processing speed.
- Initial context: Compress entire project skeleton with
.ctree/ctree_ctx_<lang>.txt - Detailed information: Retrieve only required symbols on-demand via hash reference
- Diff information: Pass only changes since previous snapshot
Verification Status: When running diff checks in new sessions, diff-based work intent understanding has proven accurate. Detailed quantitative verification of token cost reduction is planned for future work.
Reduced Synchronization Cost
Enables “follow latest only” workflows, eliminating the need to resend context per conversation turn.
- Immediately understand “what changed” from rev file diffs
- Specify related symbol hashes to dynamically fetch detailed information
Improved Visualization and Review Efficiency
Structural changes become easier to understand, with investigation starting points becoming clear.
- Visualize project structure across
.tree_annotated_<lang>.txt - Index all symbols in
symbols_<lang>.jsonlfor fast grep/jq searching
Recommended Operational Flow
Step 1: Initialization
ctree_init --config .ctree.toml
Initialize project-specific configuration (strong/weak scope definitions).
Step 2: Baseline Generation
ctree generate --config .ctree.toml
Generate initial context into .ctree/ directory. Record this state as revision 0000.
Step 3: Daily Operations
ctree_check --config .ctree.toml
ctree_check performs these operations in one call:
- Detect source code updates
- Regenerate
.ctree/ - Generate new rev file (0001.txt, etc.)
- Output final context for LLM
Step 4: LLM Integration
Provide the following as baseline context to the LLM:
# Project Structure
<Contents of ctree_ctx_<lang>.txt>
# Recent Changes (rev 0010)
<Contents of snapshots/rev/0010.txt>
When detailed information is needed, specify hashes:
MCP call: ctree_get_text(hashes=["abc123", "def456"])
Design Philosophy
Rather than always producing “heavy full text” output, the design emphasizes “lightweight index + on-demand retrieval”, maximizing the effectiveness of LLM integration in large-scale development.
“Telescope” Design
The design of recording only hashes and fetching detailed text on demand offers excellent synergy with LLM agents.
- Low Magnification (Overview): Examine project structure via
tree_annotated_<lang>.txt - Medium Magnification (Module Unit): Integrate
symbols_<lang>.jsonlwith Serena, enabling drill-down from symbol search to structural understanding - High Magnification (Code Details): Reference specific symbol implementation via
text_store_<lang>.jsonl
Volume Control for Monorepos
In DDD + Clean Architecture structured projects (e.g., features/{module}/{domain,infra,presentation}), effective volume control increases annotation density for the active module (–sw: strong width) while reducing it for dependencies (–ww: weak width).
Configuration Example:
# Active module (detailed summary)
strong = ["features/payment/**/*.rs"]
# Dependencies (simplified)
weak = ["features/*/domain/**/*.rs", "features/*/infra/**/*.rs"]
# Tests and generated code (symbols only)
symbol_only = ["tests/**/*.rs"]
This enables dynamic context volume control even in monorepo scale:
ctree generate --config .ctree.toml --sw 20 --ww 5
The active module (e.g., payment) includes detailed dependencies and implementations, while other modules’ domain/infra layers provide only structural overviews, reducing token costs while maintaining necessary information.
Current Implementation Status
- Language Support: Modularized design enables incremental language implementation as needed
- Diff Management: Diff tracking using rev file format is complete. Confirmed that work intent understanding in new sessions is accurate
- Token Cost Reduction: While the design is theoretically sound, quantitative verification is planned for future work
Summary
code-tree is a tool that improves LLM agent effectiveness from the “context compression” perspective. It enables lightweight, reproducible context management even for large projects, simultaneously reducing token costs and improving development efficiency.

