Building an MCP Server to Fix Local LLM Tool Call Failures: pathfinder Design and Benchmarks
How I built pathfinder, a Rust MCP server that uses ColBERT-based semantic search to deterministically fix path resolution failures from local LLMs with 32B parameters or fewer. Covers design motivation, architecture, benchmark results against filesystem MCP, and the interactive CLI.
Conclusion
When running a coding agent on a local LLM with 32B parameters or fewer, tool_call path resolution fails constantly. The model generates paths with typos, wrong extensions, skipped directory levels, and confused filenames. Trying to fix this with better prompts has a ceiling — the limitation is in the model’s parameter count, not the instructions.
I built pathfinder, a Rust MCP server that deterministically recovers from these path resolution failures. It combines lexical scoring with ColBERT (Late Interaction) semantic re-ranking, running INT8-quantized at 16MB, CPU-only, under 10ms latency. It consumes zero GPU VRAM, so it does not compete with the inference server for resources.
Benchmarked against the filesystem MCP server on a ~4,000-file monorepo, path resolution accuracy was comparable, but pathfinder consumed roughly 10,000 fewer context tokens. The filesystem MCP’s list_directory ancestor-checking approach scales O(n) with file count, inflating context consumption as the project grows. pathfinder scores and filters candidates before returning results, keeping context usage tight.
As a side benefit, implementing ModernBERT inference in Rust gave me a semantic search engine that I turned into an interactive CLI for browsing code across multiple projects — domain, infrastructure, and presentation layers — from a single terminal.

Motivation: Why Local LLM Tool Calls Break
Claude and GPT-4 class models rarely struggle with tool_call path resolution. Local models in the 7B–32B range are a different story.
Common failure patterns:
| Failure Pattern | Example |
|---|---|
| Directory name typo | src/componets/Button.tsx → components |
| Wrong extension | config.yaml → actually config.yml |
| Skipped directory level | utils/helper.go → actually internal/pkg/utils/helper.go |
| Similar filename confusion | auth/login.ts vs auth/login.test.ts |
Each failure generates an error response, the LLM retries, context grows, and the enlarged context degrades subsequent accuracy. It is a negative feedback loop.
Prompt-level fixes (“always use exact paths”) cannot overcome a parameter-count limitation. This is a problem that belongs in the tooling layer, not the model layer.
Architecture
pathfinder is a Rust MCP server communicating via stdin/stdout JSON-RPC 2.0 with LLM clients (Claude Code, Zed Editor, etc.).
Three-Phase Path Resolution
When the LLM generates an inaccurate path, pathfinder resolves it through three phases:
- Lexical scoring — It’s a secret. Under 1ms
- Query history correlation — A ring buffer of recent resolutions biases toward the current working context
- ColBERT semantic re-ranking — Top candidates are re-ranked using ColBERT MaxSim scores. Around 5-8ms
When lexical scoring produces a high-confidence match, neural re-ranking is skipped entirely. Most queries resolve in under 1ms.
ColBERT Model
Semantic re-ranking uses a ColBERT (Late Interaction) architecture model.
| Item | Value |
|---|---|
| Model | lateon-code-edge (code-aware) / mxbai-edge-colbert (general) |
| Parameters | ~17M |
| Quantization | INT8 (ONNX Runtime) |
| Model size | ~16MB |
| Embedding dimensions | 128 |
| Inference | CPU only (no GPU required) |
Inference runs through ONNX Runtime with HuggingFace Tokenizers for text encoding. No Python dependencies.
MCP Tools
Primary tools exposed by pathfinder:
path_resolve— Takes a failed path and returns the most likely real path. Acceptsintent_text(a short purpose description like “Go config loader”) to improve accuracytool_retry_with_resolve— Resolves the path and automatically retries the original operation (read_file, list_dir, etc.) in a single tool callcandidate_list— Returns top candidates as a ranked list for ambiguous casesroots_list/reindex_paths/server_version— Administrative tools
tool_retry_with_resolve is the workhorse in practice. When the LLM gets ENOENT from read_file, it calls this tool once to resolve and retry in a single round trip.
Path Resolution Example
When the LLM typos a directory name:
path_resolve:
check_path: "content/ja/docs/tech/infrastrcture/podman-quadlet-systemd-ubuntu.md"
^^^^^^^^^^^ typo
→ resolved: "content/ja/docs/tech/infrastructure/podman-quadlet-systemd-ubuntu.md"
$ pathfinder --help
pathfinder — semantic path finder & MCP resolution server
USAGE
pathfinder [OPTIONS] Interactive semantic directory finder (default).
pathfinder --mcp [OPTIONS] Start as an MCP server.
FINDER OPTIONS
--include-builds Include build/artifact dirs (target, dist, …).
MCP OPTIONS
--root <PATH> Add a project root directory to watch and index.
May be specified multiple times. Defaults to $PWD.
GENERAL OPTIONS
-h, --help Print this help message and exit.
-V, --version Print version, model, and PCA config to stderr and exit.
MCP TOOLS
1. path_resolve Resolve a failed file path to the best match.
2. tool_retry_with_resolve Resolve + retry the operation in one call.
3. roots_list Return configured root directories.
4. reindex_paths Force a full index rebuild.
ENVIRONMENT VARIABLES
PF_MCP_INFERENCE Inference mode: "general" (default) or "code".
Models (both INT8 quantized):
general → mxbai-edge-colbert (17M, 48-dim)
code → lateon-code-edge (17M, 48-dim)
CLI: Browsing Projects with Vim-Style Semantic Search
Alongside the MCP server, I built an interactive CLI using the same ColBERT inference engine. It is designed for browsing code across multiple projects — domain, infrastructure, and presentation layers — from a single terminal.
Controls
Launch pf (code-aware) or pfg (general) from the shell to open a semantic search interface.
- Vim keybindings for navigating the candidate list
- Drill down through directory hierarchy to the deepest nesting level, where file listings appear
- Right arrow (→) opens the selected file in
less - Selecting a result
cds to its directory
pf # Code-aware mode (lateon-code-edge)
pfg # General mode (mxbai-edge-colbert)
Because ModernBERT inference is implemented natively in Rust, search responses feel effectively instant.

Benchmarks: Comparison with filesystem MCP
Benchmarked on a monorepo application sample with approximately 4,000 files.
Path Resolution Accuracy
pathfinder standalone accuracy test (70 test cases, 12 categories):
| Category | Description | Result |
|---|---|---|
| Correct paths | Returned as-is | 8/8 |
| Directory typos | componets → components | 10/10 |
| Filename typos | Character swaps and omissions | 10/10 |
| Wrong extensions | .yaml → .yml | 7/7 |
| Skipped levels | Missing intermediate directories | 5/5 |
| Intent-based queries | Inference from purpose description | 4/5 |
| Retry operations | Resolve + retry in one call | 3/3 |
| Confusing path pairs | Distinguishing similar names | 6/6 |
| Deep nesting | 8+ directory levels | 4/4 |
| Cross-language queries | “Go config file” etc. | 4/6 |
| Test/config files | Distinguishing test vs production | 4/4 |
| Total | 67/70 (95.7%) |
Context Token Consumption
Accuracy was comparable to the filesystem MCP server, but there was a ~10,000 token difference in context consumption.
The filesystem MCP checks ancestor directories sequentially via list_directory, so response token count scales with file count. At 4,000 files this gap is already meaningful; at larger scales it will widen further.
pathfinder scores and filters candidates before returning results, so context consumption stays bounded regardless of project size. For local LLMs with limited context windows (8K–32K), this difference directly impacts downstream accuracy.
Resource Footprint
| Item | Value |
|---|---|
| Binary | Single Rust binary |
| Model size | ~16MB (INT8) |
| Memory overhead | Under 50MB |
| GPU VRAM consumption | Zero |
| Typical latency | Under 10ms (under 1ms for lexical-only) |
Zero GPU usage means pathfinder operates without affecting vLLM or llama.cpp when they are using maximum VRAM. This is a critical property in local LLM environments.
Caveats
- The internal logic of lexical scoring is a secret. pathfinder is integrated alongside ctree into our in-house open-source LLM pipeline infrastructure
- The current test suite has 70 cases, with 2 unresolved failures in cross-language queries (intent_text specifying a programming language)
- Benchmarks were conducted on a ~4,000-file monorepo. Validation at tens of thousands of files is planned
- ColBERT model selection (code-aware vs general) is controlled by the
PF_MCP_INFERENCEenvironment variable - Compliant with MCP protocol (2024-11-05). Works with Claude Code, Zed Editor, and other MCP clients
