Claude APITool UsePerformance

Three Claude API Features That Cut Agent Token Costs by 85% — And Improve Accuracy

February 28, 2025·6 min read

Production Claude agents fail at tool use in three distinct ways: they exhaust their context window before finishing a task, they make sequential inference passes that could run in parallel, or they call tools with subtly incorrect parameters. Anthropic has released three beta API features that address each failure mode independently. Understanding when to apply each — and in which combination — is a core skill for the Claude SA exam.

Feature 1: Tool Search — Deferred Definition Loading

In a multi-server MCP environment, loading every tool definition at session start can consume tens of thousands of tokens. A representative five-server setup combining a code repository, messaging platform, observability stack, infrastructure tooling, and log analysis can load around 55,000 tokens before any conversation content.

Tool search addresses this by allowing tools to be marked with defer_loading: true. These definitions are omitted from initial context. When the agent needs a tool, it issues a search query and receives only the definitions of matching tools on demand. Published results show this approach reduces definition overhead from approximately 77,000 tokens to 8,700 — an 85% reduction — while improving task accuracy on Opus 4.5 from 79.5% to 88.1%, likely because the remaining context window contains more relevant task information.

Best for: tool libraries with more than 10,000 token definitions, multi-server MCP deployments, and workflows where most tools are unused in any given session.

Feature 2: Programmatic Tool Calling — Code-Executed Orchestration

Traditional tool use requires one full model inference pass per tool call. Intermediate results accumulate in context regardless of whether they are still needed. On a workflow that requires 20 tool invocations, this creates 20 sequential inference passes and a context window progressively filled with stale data.

Programmatic tool calling inverts this: Claude writes Python code that runs in a sandboxed execution environment. Tools marked with allowed_callers: ["code_execution_20250825"] become callable as Python functions inside the sandbox. Results are processed within the execution environment — only the final filtered output enters Claude's context. For a budget compliance task involving 2,000+ expense line items across an engineering team, this approach reduces context consumption from roughly 200KB of raw data to approximately 1KB of actionable results, while eliminating 19 sequential inference passes.

python

# Claude writes orchestration code like this
expenses = await asyncio.gather(
    *[get_expenses(member["id"], "Q3") for member in team]
)
exceeded = [
    {"member": m["name"], "spent": sum(e["amount"] for e in exps)}
    for m, exps in zip(team, expenses)
    if sum(e["amount"] for e in exps) > budgets[m["level"]]
]
return exceeded  # Only this enters Claude's context

Feature 3: Tool Use Examples — Conveying Convention Beyond Schema

JSON Schema is good at defining what a tool accepts structurally, but poor at expressing usage conventions: when optional parameters apply, which field combinations are valid together, how domain-specific values should be formatted (ISO dates, proprietary ID patterns, enum edge cases). Without examples, Claude must infer these conventions from schema descriptions alone — which works for simple tools but breaks down on complex ones.

The input_examples field on tool definitions provides concrete invocation examples that demonstrate format and convention. Published accuracy results show improvement from 72% to 90% on complex parameter handling tasks. This feature is most valuable for tools with many optional parameters, domain-specific formatting requirements, or cases where multiple valid combinations produce different outcomes.

Choosing the Right Feature

Primary bottleneck is context window size → start with Tool Search
Primary bottleneck is latency or intermediate data accumulation → start with Programmatic Calling
Primary issue is incorrect or malformed tool parameters → start with Usage Examples
Complex production agents typically benefit from all three in combination

Enabling the Beta

python

client.messages.create(
    model="claude-opus-4-6",
    betas=["advanced-tool-use-2025-11-20"],
    tools=[
        {"name": "search_logs", "defer_loading": True, ...},
        {"name": "run_query",
         "allowed_callers": ["code_execution_20250825"], ...},
        {"name": "create_ticket", "input_examples": [...], ...},
    ],
    ...
)

Preparing for the Claude SA Exam?

Explore 150+ exam concepts, 91 glossary terms, and full mock exams — all free.

Browse Concept Library View Exam Guide

← Back to all articles