Conclusion

When running a coding agent on a local LLM with 32B parameters or fewer, tool_call path resolution fails constantly. The model generates paths with typos, wrong extensions, skipped directory levels, and confused filenames. Trying to fix this with better prompts has a ceiling — the limitation is in the model’s parameter count, not the instructions.

I built pathfinder, a Rust MCP server that deterministically recovers from these path resolution failures. It combines lexical scoring with ColBERT (Late Interaction) semantic re-ranking, running INT8-quantized at 16MB, CPU-only, under 10ms latency. It consumes zero GPU VRAM, so it does not compete with the inference server for resources.

Benchmarked against the filesystem MCP server on a ~4,000-file monorepo, path resolution accuracy was comparable, but pathfinder consumed roughly 10,000 fewer context tokens. The filesystem MCP’s list_directory ancestor-checking approach scales O(n) with file count, inflating context consumption as the project grows. pathfinder scores and filters candidates before returning results, keeping context usage tight.

As a side benefit, implementing ModernBERT inference in Rust gave me a semantic search engine that I turned into an interactive CLI for browsing code across multiple projects — domain, infrastructure, and presentation layers — from a single terminal.

Zed Editor MCP Servers panel showing ctree, pathfinder, and serena running as independent MCP servers
Zed Editor MCP Servers — ctree, pathfinder, and serena operating as independent MCP tools

Motivation: Why Local LLM Tool Calls Break

Claude and GPT-4 class models rarely struggle with tool_call path resolution. Local models in the 7B–32B range are a different story.

Common failure patterns:

Failure PatternExample
Directory name typosrc/componets/Button.tsxcomponents
Wrong extensionconfig.yaml → actually config.yml
Skipped directory levelutils/helper.go → actually internal/pkg/utils/helper.go
Similar filename confusionauth/login.ts vs auth/login.test.ts

Each failure generates an error response, the LLM retries, context grows, and the enlarged context degrades subsequent accuracy. It is a negative feedback loop.

Prompt-level fixes (“always use exact paths”) cannot overcome a parameter-count limitation. This is a problem that belongs in the tooling layer, not the model layer.


Architecture

pathfinder is a Rust MCP server communicating via stdin/stdout JSON-RPC 2.0 with LLM clients (Claude Code, Zed Editor, etc.).

Three-Phase Path Resolution

When the LLM generates an inaccurate path, pathfinder resolves it through three phases:

  1. Lexical scoring — It’s a secret. Under 1ms
  2. Query history correlation — A ring buffer of recent resolutions biases toward the current working context
  3. ColBERT semantic re-ranking — Top candidates are re-ranked using ColBERT MaxSim scores. Around 5-8ms

When lexical scoring produces a high-confidence match, neural re-ranking is skipped entirely. Most queries resolve in under 1ms.

ColBERT Model

Semantic re-ranking uses a ColBERT (Late Interaction) architecture model.

ItemValue
Modellateon-code-edge (code-aware) / mxbai-edge-colbert (general)
Parameters~17M
QuantizationINT8 (ONNX Runtime)
Model size~16MB
Embedding dimensions128
InferenceCPU only (no GPU required)

Inference runs through ONNX Runtime with HuggingFace Tokenizers for text encoding. No Python dependencies.

MCP Tools

Primary tools exposed by pathfinder:

  • path_resolve — Takes a failed path and returns the most likely real path. Accepts intent_text (a short purpose description like “Go config loader”) to improve accuracy
  • tool_retry_with_resolve — Resolves the path and automatically retries the original operation (read_file, list_dir, etc.) in a single tool call
  • candidate_list — Returns top candidates as a ranked list for ambiguous cases
  • roots_list / reindex_paths / server_version — Administrative tools

tool_retry_with_resolve is the workhorse in practice. When the LLM gets ENOENT from read_file, it calls this tool once to resolve and retry in a single round trip.

Path Resolution Example

When the LLM typos a directory name:

  path_resolve:
  check_path: "content/ja/docs/tech/infrastrcture/podman-quadlet-systemd-ubuntu.md"
                                      ^^^^^^^^^^^ typo
  →  resolved: "content/ja/docs/tech/infrastructure/podman-quadlet-systemd-ubuntu.md"
  
  $ pathfinder --help
pathfinder — semantic path finder & MCP resolution server

USAGE
    pathfinder [OPTIONS]          Interactive semantic directory finder (default).
    pathfinder --mcp [OPTIONS]    Start as an MCP server.

FINDER OPTIONS
    --include-builds    Include build/artifact dirs (target, dist, …).

MCP OPTIONS
    --root <PATH>       Add a project root directory to watch and index.
                        May be specified multiple times.  Defaults to $PWD.

GENERAL OPTIONS
    -h, --help          Print this help message and exit.
    -V, --version       Print version, model, and PCA config to stderr and exit.

MCP TOOLS
  1. path_resolve            Resolve a failed file path to the best match.
  2. tool_retry_with_resolve Resolve + retry the operation in one call.
  3. roots_list              Return configured root directories.
  4. reindex_paths           Force a full index rebuild.

ENVIRONMENT VARIABLES
    PF_MCP_INFERENCE    Inference mode: "general" (default) or "code".
    Models (both INT8 quantized):
      general → mxbai-edge-colbert (17M, 48-dim)
      code    → lateon-code-edge (17M, 48-dim)
  

Alongside the MCP server, I built an interactive CLI using the same ColBERT inference engine. It is designed for browsing code across multiple projects — domain, infrastructure, and presentation layers — from a single terminal.

Controls

Launch pf (code-aware) or pfg (general) from the shell to open a semantic search interface.

  • Vim keybindings for navigating the candidate list
  • Drill down through directory hierarchy to the deepest nesting level, where file listings appear
  • Right arrow (→) opens the selected file in less
  • Selecting a result cds to its directory
  pf          # Code-aware mode (lateon-code-edge)
pfg         # General mode (mxbai-edge-colbert)
  

Because ModernBERT inference is implemented natively in Rust, search responses feel effectively instant.

pathfinder CLI: pf command running semantic search and selecting a directory to cd into
pathfinder CLI — semantic search across projects, selecting a result to cd into the directory

Benchmarks: Comparison with filesystem MCP

Benchmarked on a monorepo application sample with approximately 4,000 files.

Path Resolution Accuracy

pathfinder standalone accuracy test (70 test cases, 12 categories):

CategoryDescriptionResult
Correct pathsReturned as-is8/8
Directory typoscomponetscomponents10/10
Filename typosCharacter swaps and omissions10/10
Wrong extensions.yaml.yml7/7
Skipped levelsMissing intermediate directories5/5
Intent-based queriesInference from purpose description4/5
Retry operationsResolve + retry in one call3/3
Confusing path pairsDistinguishing similar names6/6
Deep nesting8+ directory levels4/4
Cross-language queries“Go config file” etc.4/6
Test/config filesDistinguishing test vs production4/4
Total67/70 (95.7%)

Context Token Consumption

Accuracy was comparable to the filesystem MCP server, but there was a ~10,000 token difference in context consumption.

The filesystem MCP checks ancestor directories sequentially via list_directory, so response token count scales with file count. At 4,000 files this gap is already meaningful; at larger scales it will widen further.

pathfinder scores and filters candidates before returning results, so context consumption stays bounded regardless of project size. For local LLMs with limited context windows (8K–32K), this difference directly impacts downstream accuracy.


Resource Footprint

ItemValue
BinarySingle Rust binary
Model size~16MB (INT8)
Memory overheadUnder 50MB
GPU VRAM consumptionZero
Typical latencyUnder 10ms (under 1ms for lexical-only)

Zero GPU usage means pathfinder operates without affecting vLLM or llama.cpp when they are using maximum VRAM. This is a critical property in local LLM environments.


Caveats

  • The internal logic of lexical scoring is a secret. pathfinder is integrated alongside ctree into our in-house open-source LLM pipeline infrastructure
  • The current test suite has 70 cases, with 2 unresolved failures in cross-language queries (intent_text specifying a programming language)
  • Benchmarks were conducted on a ~4,000-file monorepo. Validation at tens of thousands of files is planned
  • ColBERT model selection (code-aware vs general) is controlled by the PF_MCP_INFERENCE environment variable
  • Compliant with MCP protocol (2024-11-05). Works with Claude Code, Zed Editor, and other MCP clients