On this page

Building an MCP Server to Fix Local LLM Tool Call Failures: pathfinder Design and Benchmarks

How I built pathfinder, a Rust MCP server that uses ColBERT-based semantic search to deterministically fix path resolution failures from local LLMs with 32B parameters or fewer. Covers design motivation, architecture, benchmark results against filesystem MCP, and the interactive CLI.

These articles use AI-generated summaries of Obsidian notes originally kept as technical memos.

English translations are produced with AI assistance.

Conclusion

When running a coding agent on a local LLM with 32B parameters or fewer, tool_call path resolution fails constantly. The model generates paths with typos, wrong extensions, skipped directory levels, and confused filenames. Trying to fix this with better prompts has a ceiling — the limitation is in the model’s parameter count, not the instructions.

I built pathfinder, a Rust MCP server that deterministically recovers from these path resolution failures. It combines lexical scoring with ColBERT (Late Interaction) semantic re-ranking, running INT8-quantized at 16MB, CPU-only, under 10ms latency. It consumes zero GPU VRAM, so it does not compete with the inference server for resources.

Benchmarked against the filesystem MCP server on a ~4,000-file monorepo, path resolution accuracy was comparable, but pathfinder consumed roughly 10,000 fewer context tokens. The filesystem MCP’s list_directory ancestor-checking approach scales O(n) with file count, inflating context consumption as the project grows. pathfinder scores and filters candidates before returning results, keeping context usage tight.

As a side benefit, implementing ModernBERT inference in Rust gave me a semantic search engine that I turned into an interactive CLI for browsing code across multiple projects — domain, infrastructure, and presentation layers — from a single terminal.

Zed Editor MCP Servers panel showing ctree, pathfinder, and serena running as independent MCP servers — Zed Editor MCP Servers — ctree, pathfinder, and serena operating as independent MCP tools

Motivation: Why Local LLM Tool Calls Break

Claude and GPT-4 class models rarely struggle with tool_call path resolution. Local models in the 7B–32B range are a different story.

Common failure patterns:

Failure Pattern	Example
Directory name typo	`src/componets/Button.tsx` → `components`
Wrong extension	`config.yaml` → actually `config.yml`
Skipped directory level	`utils/helper.go` → actually `internal/pkg/utils/helper.go`
Similar filename confusion	`auth/login.ts` vs `auth/login.test.ts`

Each failure generates an error response, the LLM retries, context grows, and the enlarged context degrades subsequent accuracy. It is a negative feedback loop.

Prompt-level fixes (“always use exact paths”) cannot overcome a parameter-count limitation. This is a problem that belongs in the tooling layer, not the model layer.

Architecture

pathfinder is a Rust MCP server communicating via stdin/stdout JSON-RPC 2.0 with LLM clients (Claude Code, Zed Editor, etc.).

Three-Phase Path Resolution

When the LLM generates an inaccurate path, pathfinder resolves it through three phases:

Lexical scoring — It’s a secret. Under 1ms
Query history correlation — A ring buffer of recent resolutions biases toward the current working context
ColBERT semantic re-ranking — Top candidates are re-ranked using ColBERT MaxSim scores. Around 5-8ms

When lexical scoring produces a high-confidence match, neural re-ranking is skipped entirely. Most queries resolve in under 1ms.

ColBERT Model

Semantic re-ranking uses a ColBERT (Late Interaction) architecture model.

Item	Value
Model	lateon-code-edge (code-aware) / mxbai-edge-colbert (general)
Parameters	~17M
Quantization	INT8 (ONNX Runtime)
Model size	~16MB
Embedding dimensions	128
Inference	CPU only (no GPU required)

Inference runs through ONNX Runtime with HuggingFace Tokenizers for text encoding. No Python dependencies.

MCP Tools

Primary tools exposed by pathfinder:

path_resolve — Takes a failed path and returns the most likely real path. Accepts intent_text (a short purpose description like “Go config loader”) to improve accuracy
tool_retry_with_resolve — Resolves the path and automatically retries the original operation (read_file, list_dir, etc.) in a single tool call
candidate_list — Returns top candidates as a ranked list for ambiguous cases
roots_list / reindex_paths / server_version — Administrative tools

tool_retry_with_resolve is the workhorse in practice. When the LLM gets ENOENT from read_file, it calls this tool once to resolve and retry in a single round trip.

Path Resolution Example

When the LLM typos a directory name:

  path_resolve:
  check_path: "content/ja/docs/tech/infrastrcture/podman-quadlet-systemd-ubuntu.md"
                                      ^^^^^^^^^^^ typo
  →  resolved: "content/ja/docs/tech/infrastructure/podman-quadlet-systemd-ubuntu.md"

  $ pathfinder --help
pathfinder — semantic path finder & MCP resolution server

USAGE
    pathfinder [OPTIONS]          Interactive semantic directory finder (default).
    pathfinder --mcp [OPTIONS]    Start as an MCP server.

FINDER OPTIONS
    --include-builds    Include build/artifact dirs (target, dist, …).

MCP OPTIONS
    --root <PATH>       Add a project root directory to watch and index.
                        May be specified multiple times.  Defaults to $PWD.

GENERAL OPTIONS
    -h, --help          Print this help message and exit.
    -V, --version       Print version, model, and PCA config to stderr and exit.

MCP TOOLS
  1. path_resolve            Resolve a failed file path to the best match.
  2. tool_retry_with_resolve Resolve + retry the operation in one call.
  3. roots_list              Return configured root directories.
  4. reindex_paths           Force a full index rebuild.

ENVIRONMENT VARIABLES
    PF_MCP_INFERENCE    Inference mode: "general" (default) or "code".
    Models (both INT8 quantized):
      general → mxbai-edge-colbert (17M, 48-dim)
      code    → lateon-code-edge (17M, 48-dim)

CLI: Browsing Projects with Vim-Style Semantic Search

Alongside the MCP server, I built an interactive CLI using the same ColBERT inference engine. It is designed for browsing code across multiple projects — domain, infrastructure, and presentation layers — from a single terminal.

Controls

Launch pf (code-aware) or pfg (general) from the shell to open a semantic search interface.

Vim keybindings for navigating the candidate list
Drill down through directory hierarchy to the deepest nesting level, where file listings appear
Right arrow (→) opens the selected file in less
Selecting a result cds to its directory

  pf          # Code-aware mode (lateon-code-edge)
pfg         # General mode (mxbai-edge-colbert)

Because ModernBERT inference is implemented natively in Rust, search responses feel effectively instant.

pathfinder CLI: pf command running semantic search and selecting a directory to cd into — pathfinder CLI — semantic search across projects, selecting a result to cd into the directory

Benchmarks: Comparison with filesystem MCP

Benchmarked on a monorepo application sample with approximately 4,000 files.

Path Resolution Accuracy

pathfinder standalone accuracy test (70 test cases, 12 categories):

Category	Description	Result
Correct paths	Returned as-is	8/8
Directory typos	`componets` → `components`	10/10
Filename typos	Character swaps and omissions	10/10
Wrong extensions	`.yaml` → `.yml`	7/7
Skipped levels	Missing intermediate directories	5/5
Intent-based queries	Inference from purpose description	4/5
Retry operations	Resolve + retry in one call	3/3
Confusing path pairs	Distinguishing similar names	6/6
Deep nesting	8+ directory levels	4/4
Cross-language queries	“Go config file” etc.	4/6
Test/config files	Distinguishing test vs production	4/4
Total		67/70 (95.7%)

Context Token Consumption

Accuracy was comparable to the filesystem MCP server, but there was a ~10,000 token difference in context consumption.

The filesystem MCP checks ancestor directories sequentially via list_directory, so response token count scales with file count. At 4,000 files this gap is already meaningful; at larger scales it will widen further.

pathfinder scores and filters candidates before returning results, so context consumption stays bounded regardless of project size. For local LLMs with limited context windows (8K–32K), this difference directly impacts downstream accuracy.

Resource Footprint

Item	Value
Binary	Single Rust binary
Model size	~16MB (INT8)
Memory overhead	Under 50MB
GPU VRAM consumption	Zero
Typical latency	Under 10ms (under 1ms for lexical-only)

Zero GPU usage means pathfinder operates without affecting vLLM or llama.cpp when they are using maximum VRAM. This is a critical property in local LLM environments.

Caveats

The internal logic of lexical scoring is a secret. pathfinder is integrated alongside ctree into our in-house open-source LLM pipeline infrastructure
The current test suite has 70 cases, with 2 unresolved failures in cross-language queries (intent_text specifying a programming language)
Benchmarks were conducted on a ~4,000-file monorepo. Validation at tens of thousands of files is planned
ColBERT model selection (code-aware vs general) is controlled by the PF_MCP_INFERENCE environment variable
Compliant with MCP protocol (2024-11-05). Works with Claude Code, Zed Editor, and other MCP clients

Designing Bilingual System Prompts for the PLAMO-translate AI MODEL

Full text and design rationale …

LTX-2 Video Generation Prompt Engineering: From 36-Scene Horror to Cinematic Continuity Pipelines

Structured prompt …