On this page

Building an AST-Based Codebase Analyzer for Local LLM Context: ctree Design and pathfinder/Serena Integration

I built ctree, a Rust AST-based on-demand codebase analyzer for the languages I commonly use, to reduce local LLM context consumption. It provides shallow codebase overview via hash-based snapshots, complementing pathfinder for path accuracy and Serena for symbol-level precision.

These articles use AI-generated summaries of Obsidian notes originally kept as technical memos.

English translations are produced with AI assistance.

Conclusion

When running a coding agent on a local LLM, context consumption efficiency determines practical usability. Symbol analysis tools like Serena are precise, but loading all symbols upfront causes context to explode.

I built ctree, an AST-based on-demand codebase analyzer in Rust, targeting only the languages I commonly use. It manages codebase-wide snapshots with hash-based tracking, following implementation changes and dependency changes in compact form. The implementer can adjust scope arbitrarily based on their current focus.

Get shallow hints from ctree, then raise resolution with Serena’s find_symbol — that is the core design. Combined with pathfinder for file operation accuracy, this achieves both context savings and tool precision improvement.

Given the per-token cost of GPU context processing, software-side performance improvements are highly cost-effective.

Zed Editor MCP Servers panel showing ctree (5 tools), pathfinder, and serena (22 tools) running as independent MCP servers — Zed Editor MCP Servers — ctree, pathfinder, and serena operating as independent MCP tools

Motivation: Complementing Serena’s Context Cost

Serena provides precise symbol-level analysis, but it is not suited for “first, get a shallow overview of the entire codebase.” Loading all symbols inflates context and pressures the limited context windows of local LLMs.

What was needed was a “shallow overview” layer placed before Serena.

Layer	Tool	Role
File operation accuracy	pathfinder	Recovers from path resolution failures, prevents context bloat
Shallow codebase overview	ctree	Provides implementation and dependency changes compactly via hash-based snapshots
Precise symbol analysis	Serena	Retrieves specific symbol definitions and references via `find_symbol`

ctree was designed alongside pathfinder. pathfinder ensures file path accuracy, ctree provides structural hints about the codebase, and Serena delivers final resolution. All three operate as independent MCP tools.

CLI

  $ ctree --help
Generate annotated tree + symbols + compact ctx

Usage: ctree [OPTIONS] [COMMAND]

Commands:
  init
  reset
  check
  clean
  help   Print this message or the help of the given subcommand(s)

Options:
      --mcp                    Run as MCP stdio server
      --config <CONFIG>        Config file path (default: .ctree.toml if exists)
      --root <ROOT>            Scan root path [default: .]
      --syntax <SYNTAX>        [possible values: go, rust, python, typescript,
                                csharp, dart, lua, awk, shell, kotlin, swift,
                                markdown, html]
      --include <INCLUDE>      Include globs
      --exclude <EXCLUDE>      Exclude globs [default: **/tmp/**,**/testdata/**,
                                **/.git/**,**/vendor/**]
      --strong <STRONG>        Strong scope globs
      --weak <WEAK>            Weak scope globs
      --sw <SW>                Max annotation words per strong file line [default: 12]
      --ww <WW>                Max annotation words per weak/symbol-only file line [default: 3]
      --reasoning <REASONING>  Reasoning level [possible values: high, medium, low]
      --template <TEMPLATE>    Output template [possible values: plain, hugo, jinja]
  -h, --help                   Print help

Design: AST-Based On-Demand Analysis

Supported Languages

I implemented tree-sitter based AST parsers for the languages I regularly use.

Go, Rust, Python, TypeScript/JavaScript, Dart, Kotlin, Lua, Shell, Awk, C#, Swift, Markdown, HTML

This covers not just application code but also shell scripts and documentation. In practice, looking only at the primary language rarely gives a complete picture of a repository.

strong / weak Scope

Rather than analyzing the entire codebase uniformly, files are separated into strong (detailed) and weak (abbreviated) scopes.

  [ctree]
watch = "rust"
root = "."
include = ["src/**/*.rs", "tests/**/*.rs"]
exclude = ["**/tmp/**", "**/testdata/**", "**/.git/**", "**/vendor/**", "**/target/**"]
strong = ["src/**"]
weak = ["tests/**"]
sw = 24
ww = 5
reasoning = "medium"

Files in strong get detailed symbol summaries; weak files get lightweight overviews only. When focus shifts, changing this configuration switches the scope.

This separation is the core design decision in ctree. Instead of carrying the entire codebase at full weight, the implementer actively decides where to invest detail.

reasoning Preset

Summary detail level is controlled by a single field: reasoning = "high" | "medium" | "low".

reasoning	Use case
high	Small projects, situations requiring detailed analysis
medium	Normal development work (default)
low	Large monorepos, minimizing token consumption

Hash-Based Snapshots

A deterministic hash is generated from scope settings (strong/weak patterns + reasoning level) to isolate snapshot directories.

  .ctree/
  rust/
    e83adb52/          ← hash of scope configuration
      snapshots/
        .baseline.txt
        rev/
          0001.txt
          0002.txt

Previous snapshots are preserved when switching scopes. Switching back restores without regeneration. This makes branch switches and focus changes low-cost.

MCP Tools

ctree exposes 5 MCP tools.

Tool	Role
`check`	Generate/update snapshots. Returns the latest diff
`get_baseline`	Returns annotated file tree + symbol summaries for the entire codebase
`get_revs`	Returns revision history (what changed)
`get_text`	Resolves hashes to symbol or dependency text
`get_depends`	Explores dependency relationships between symbols

Actual Output

ctree check — Snapshot Generation and Diff Detection

  $ ctree check --syntax rust
frequency=always
action=generate
latest_before=0001
latest_after=0001
(no new rev)

When no changes are detected, it returns (no new rev) immediately. When changes are found, a new revision is generated with hashes of changed symbols.

First-time generation output:

  $ ctree check --syntax rust
frequency=always
action=generate
latest_before=
latest_after=0001
rev_file=.ctree/rust/{scope_hash}/snapshots/rev/0001.txt
hashes={hash1},{hash2},{hash3},...

{hash1}
source=symbol kind=module name=... scope=strong path=src/.../main.rs line=2
mod ...;
--{hash1}
{hash2}
source=symbol kind=function name=... scope=strong path=src/.../mcp.rs line=164
fn ...(args: ...) -> Result<...> {
--{hash2}

Each hash is a symbol identifier — pass it to get_text to retrieve the full body.

get_baseline — Shallow Codebase Overview

What get_baseline returns is an annotated file tree and symbol summaries — not full source code. The baseline includes an estimated token count, from which you identify symbols of interest and retrieve definitions via Serena’s find_symbol.

The detailed output format is not disclosed, as ctree is integrated into our in-house LLM pipeline infrastructure.

get_revs — Revision Diffs

Returns compact revision diffs expressing symbol additions/removals and dependency additions/removals. Pass hashes to get_text to retrieve full symbol bodies or dependency details.

The detailed output format is not disclosed.

Three-Layer Integration: pathfinder + ctree + Serena

pathfinder — Recovering from Path Resolution Failures

Even when the LLM typos a directory name, pathfinder resolves to the correct path.

  path_resolve:
  check_path: "content/ja/docs/tech/infrastrcture/podman-quadlet-systemd-ubuntu.md"
                                      ^^^^^^^^^^^ typo
  →  resolved: "content/ja/docs/tech/infrastructure/podman-quadlet-systemd-ubuntu.md"

  $ pathfinder --help
pathfinder — semantic path finder & MCP resolution server

USAGE
    pathfinder [OPTIONS]          Interactive semantic directory finder (default).
    pathfinder --mcp [OPTIONS]    Start as an MCP server.

FINDER OPTIONS
    --include-builds    Include build/artifact dirs (target, dist, …).

MCP OPTIONS
    --root <PATH>       Add a project root directory to watch and index.

GENERAL OPTIONS
    -h, --help          Print this help message and exit.
    -V, --version       Print version, model, and PCA config.

MCP TOOLS
  1. path_resolve        Resolve a failed file path to the best match.
  2. tool_retry_with_resolve  Resolve + retry the operation in one call.
  3. roots_list          Return configured root directories.
  4. reindex_paths       Force a full index rebuild.

ENVIRONMENT VARIABLES
    PF_MCP_INFERENCE    Inference mode: "general" (default) or "code".
    Models (both INT8 quantized):
      general → mxbai-edge-colbert (17M, 48-dim)
      code    → lateon-code-edge (17M, 48-dim)

pathfinder CLI: pf command running semantic search and selecting a directory to cd into — pathfinder CLI — semantic search across projects, selecting a result to cd into the directory

Expected Workflow

  1. pathfinder  → resolve LLM path typos (ENOENT → tool_retry_with_resolve)
2. ctree check → update snapshots, get hashes of changed symbols
3. get_baseline → get a shallow overview of the codebase
4. Identify symbols of interest → drill down with Serena's find_symbol
5. get_revs + get_text → get details of changed symbols

The key point of this architecture is that each layer operates as an independent MCP tool, acquiring information progressively. Instead of loading everything upfront, start with a shallow overview and drill down as needed.

GPU context processing is expensive. For local LLMs operating within 8K–32K context windows, improving context efficiency on the software side translates directly to cost reduction.

Caveats

ctree is an AST parser limited to languages I commonly use, not a general-purpose static analysis tool
The detailed output formats of get_baseline and get_revs are not disclosed, as ctree is integrated into our in-house LLM pipeline infrastructure
Integration of ML inference (ONNX, ColBERT, etc.) was considered and rejected. The strong/weak scope control already solves the monorepo context explosion problem, so the deterministic design was maintained
Compliant with MCP protocol (2024-11-05). Works with Claude Code, Zed Editor, and other MCP clients

What I Learned from Running Command-A Reasoning 08-2025 Inside an Aider Coding Loop

A hands-on evaluation of …

Reworking a Local AI Coding Environment Around Serena MCP

Designing a local AI coding …