Domain 2: Tool Design & MCP Integration (18%)Lesson 8 of 30

2.1 Designing Tool Interfaces

2.1.1 Why the Description Is the Whole Game

In Domain 1 you gave an agent tools and let it choose which to call. We glossed over a crucial question: HOW does the model decide which tool fits a request? The answer surprises people, and it's the foundation of all of Domain 2 — the model chooses a tool almost entirely by reading its DESCRIPTION. Not the tool's name, not some hidden wiring; the plain-language description you wrote is the primary thing the model uses to pick.

Think of it like a new employee on their first day, handed a drawer of unlabelled keys. They have no idea which key opens which door — they'll fumble, try the wrong ones, maybe unlock something they shouldn't. Now give each key a clear tag: 'front entrance', 'supply closet — NOT the server room.' Suddenly they choose correctly every time. A tool's description is that tag. Write vague tags and the model fumbles; write precise ones and it picks the right tool reliably.

This reframes tool design entirely. Designing a good tool isn't mainly about the code behind it — it's about writing a description clear enough that the model never has to guess. Task Statement 2.1 is about exactly that craft: writing descriptions that differentiate tools, and fixing the misrouting that vague descriptions cause.

How the model picks a tooluser request"check my order #123"the model readseach tool's DESCRIPTIONthe primary selection signalpicks a toolright one if tags are clearvague descriptions → wrong tool · clear descriptions → right tool

The model selects a tool primarily by reading each tool's description. Vague descriptions make it fumble like someone with unlabelled keys; precise ones make selection reliable.

ℹ️

The one idea to hold onto

Tool descriptions are the PRIMARY mechanism the model uses to select a tool. Designing a good tool is mostly about writing a description clear enough that the model never has to guess.

2.1.2 What Separates a Minimal Description From a Production One

So what does a good description actually contain? Compare two versions of the same tool. Minimal: "Retrieves customer information." Production-grade: "Retrieves a customer's account details by customer ID or email. Use for profile, contact, and account-status questions. Do NOT use for order-specific queries — use lookup_order instead. Input: a customer_id (format: CUST-12345) or an email address." The second one tells the model not just WHAT the tool does but exactly WHEN to reach for it and when not to.

A production-grade description carries five things, and you can treat them as a checklist. Miss any of them and you've left the model room to guess.

  • 1.Primary purpose — what the tool does, in one clear sentence.
  • 2.Input expectations — types, formats, constraints, and which inputs are required vs optional (e.g. 'customer_id in the form CUST-12345').
  • 3.Example queries — the kinds of requests that should route here, in the words a user would actually use.
  • 4.Edge cases and limitations — what it can't do, so the model doesn't over-reach.
  • 5.Explicit boundaries vs similar tools — 'use this NOT that' — the single highest-leverage line when you have look-alike tools.
Minimal (✗)Production-grade (✓)
Purpose"Retrieves customer information""Retrieves account details by customer ID or email"
Inputs(unspecified)"customer_id (CUST-12345) or email"
When to use(left to guess)"profile, contact, account-status questions"
Boundaries(none)"Do NOT use for orders — use lookup_order"

The production description removes every opportunity to guess. The boundary line ('NOT for orders') is what prevents the look-alike-tool confusion in the next section.

2.1.2 — Key Concept

A production-grade tool description states its purpose, input expectations, example queries, edge cases, and — crucially — explicit boundaries versus similar tools. The 'use this NOT that' boundary is the most valuable line when tools could be confused.

2.1.3 The Misrouting Problem — and the Lowest-Effort Fix

Here's the failure the exam tests most in 2.1. You have two tools — get_customer ("Retrieves customer information") and lookup_order ("Retrieves order details") — both with minimal descriptions and both accepting similar-looking identifiers. In production, the agent keeps calling get_customer when users ask about orders ("check my order #12345"). It's misrouting: the descriptions are too similar for the model to tell them apart, so it picks wrong.

Now the trap. Four fixes look plausible: (A) add 5–8 few-shot examples of correct routing; (B) EXPAND each description with formats, example queries, edge cases, and explicit boundaries; (C) build a routing layer that parses the user's words and pre-selects the tool; (D) consolidate the two tools into one lookup_entity. The exam wants the most effective FIRST step — and that's B. The root cause is inadequate descriptions, so fixing the descriptions addresses the cause directly, with the least effort.

Why the others lose: few-shot examples (A) add token overhead on every call without fixing the underlying ambiguity; a routing layer (C) is over-engineering that bypasses the model's own language understanding; consolidating tools (D) is a legitimate architectural option but is MORE effort than a 'first step' warrants when the immediate problem is just thin descriptions. The principle generalises: when tools misroute, reach for better descriptions before reaching for machinery.

Misrouting: similar descriptions → wrong tooltwo thin descriptions"customer info" / "order details"model can't tell apartroutes order → get_customer✓ expand descriptionsformats · examples · boundarieslowest-effort root-cause fixfew-shot = token overhead · routing layer = over-engineered · consolidation = more effort

Misrouting comes from descriptions too similar to tell apart. The lowest-effort, root-cause fix is to expand the descriptions — not few-shot, a routing layer, or consolidation.

2.1.3 — Key Concept

When similar tools misroute, the root cause is inadequate descriptions and the most effective first step is to EXPAND them (formats, examples, edge cases, boundaries). Few-shot adds overhead, a routing layer over-engineers, and consolidation is more effort than a first step warrants.

2.1.4 Splitting, Renaming, and the System-Prompt Catch

Beyond expanding descriptions, two reshaping techniques help when a tool is doing too much or is poorly named. SPLITTING: a vague catch-all like analyze_document is hard to route to because it does many things; split it into purpose-specific tools — extract_data_points, summarize_content, verify_claim_against_source — each with a crisp input/output contract the model can match precisely. RENAMING: a tool called analyze_content that only handles web results is misleadingly broad; rename it extract_web_results and give it a web-specific description, eliminating overlap with other analyzers.

There's one more catch the exam likes, and it's easy to miss: the SYSTEM PROMPT can quietly override your careful tool descriptions. Keyword-sensitive instructions in the system prompt can create unintended associations — for instance, a system prompt that says 'always summarize content for the user' might nudge the model toward a summarize tool even when extraction is what's needed. So when you're debugging tool selection, don't stop at the descriptions; review the system prompt for wording that competes with them.

ℹ️

2.1.4 — Key Concept

Split generic tools into purpose-specific ones with clear contracts, and rename tools whose names overstate their scope. And remember: keyword-sensitive system-prompt wording can override even well-written tool descriptions — review it when diagnosing selection problems.

2.1.5 The Exam Traps

The 2.1 traps all reward the same instinct: when tool selection goes wrong, fix the DESCRIPTION first, before reaching for heavier machinery. The wrong answers consistently offer more infrastructure where a clearer description would do.

Tempting answerWhy it's a distractor
Add 5–8 few-shot examplesAdds token overhead every call; doesn't fix the ambiguous descriptions
Build a routing/keyword classifierOver-engineered; bypasses the model's own language understanding
Consolidate tools into oneValid sometimes, but more effort than a 'first step'
Switch to a bigger modelThe problem is description clarity, not model capability

Each distractor adds machinery. The root-cause, lowest-effort fix for misrouting is almost always to expand and differentiate the descriptions.

⚠️

2.1.5 — Exam Trap

When an agent calls the wrong tool among similar ones, choose 'expand/differentiate the descriptions' as the first step. Reject few-shot (overhead), routing classifiers (over-engineering), consolidation (more effort), and bigger models (wrong problem). And check the system prompt for wording that overrides descriptions.

2.1.6 Put It Together: Fix a Misrouting Agent

You now understand that descriptions drive selection, what a production-grade description contains, how to fix misrouting at its root, and the reshaping and system-prompt nuances. The exercise makes the cause-and-effect concrete: you'll watch routing accuracy jump just by rewriting descriptions, no new infrastructure required.

2.1.6 — Build Exercise (30 min)

(1) Define two deliberately ambiguous tools (e.g. get_customer and lookup_order) with one-line descriptions. (2) Test 10 mixed queries and record how often each routes correctly. (3) Rewrite both descriptions to production grade — purpose, input formats, example queries, edge cases, and explicit 'use this NOT that' boundaries. (4) Re-test the same 10 queries and measure the improvement. (5) Finally, review your system prompt for any keyword-sensitive instruction that competes with the descriptions.

Descriptions get the model to call the RIGHT tool. The next lesson, 2.2, is about what happens when that tool FAILS — and how to report the failure so the agent can recover intelligently instead of flailing.

ℹ️

Where this shows up on the exam

2.1 questions describe an agent calling the wrong tool. If your first move is always 'expand and differentiate the descriptions' — and you can name why few-shot, routing layers, and consolidation are inferior first steps — you'll answer them correctly.

Key Takeaways

  • Tool descriptions are the PRIMARY mechanism the model uses to select a tool — designing a good tool is mostly about writing a description clear enough that the model never guesses.
  • A production-grade description has five parts: primary purpose, input expectations (formats/constraints/required), example queries, edge cases/limitations, and explicit boundaries vs similar tools.
  • The 'use this NOT that' boundary line is the highest-leverage element when tools could be confused with each other.
  • Misrouting between similar tools is caused by inadequate descriptions; the most effective FIRST step is to expand them — not few-shot (overhead), a routing layer (over-engineering), or consolidation (more effort).
  • Reshape tools when needed: split a generic tool into purpose-specific tools with clear contracts; rename tools whose names overstate their scope.
  • Keyword-sensitive system-prompt wording can override well-written tool descriptions — review the system prompt when diagnosing tool-selection problems.
  • When tool selection misbehaves, fix the description before adding machinery; bigger models don't fix a description-clarity problem.

Check Your Understanding

Test what you learned in this lesson.

Q1.Your agent frequently calls get_customer when users ask about orders ("check my order #12345"). Both tools have minimal descriptions ("Retrieves customer information" / "Retrieves order details") and accept similar identifiers. What's the most effective FIRST step?

Q2.Which element of a tool description is most valuable for preventing confusion between two similar tools?

Q3.A generic analyze_document tool is hard for the model to route to because it does several different things. What's the best design fix?

Q4.You've written excellent tool descriptions but the model still favors the wrong tool. What else should you check?

Practice This Lesson