Domain 1: Agentic Architecture & Orchestration (27%)Lesson 4 of 30

1.4 Workflow Enforcement & Handoff

1.4.1 The Question Behind Half the Exam

There's a question that quietly runs underneath this entire certification, and Task Statement 1.4 is where you meet it head-on: when is it enough to ASK the model to do something, and when do you have to FORCE it? Get this distinction right and a surprising number of exam questions answer themselves.

Think about the difference between a sign and a lock. A sign on a door that says "Staff only, please" works most of the time — most people respect it. But if what's behind the door is dangerous or expensive, "most of the time" isn't good enough; you fit a lock. A prompt instruction to the model is a sign: "please always verify the customer before issuing a refund." The model will usually comply. But 'usually' means a small fraction of refunds go out to unverified accounts — and when that's real money, a sign isn't enough. You need a lock.

In agent design, the 'sign' is prompt-based guidance and the 'lock' is programmatic enforcement — actual code that makes the unsafe path impossible. This lesson is about knowing which one a situation demands, and how to build the lock when you need it. We also cover what to do when the agent has to hand a problem off to a human — another moment where being deliberate matters.

A sign vs. a lockPrompt = a sign"please verify first"works MOST of the timefine for low stakesCode = a lockmakes the unsafe path impossibleworks EVERY timerequired for money / security

A prompt is a sign — respected most of the time. Code enforcement is a lock — the unsafe path is simply impossible. The stakes decide which you need.

ℹ️

The one idea to hold onto

Prompt-based guidance is probabilistic (a sign — works most of the time); programmatic enforcement is deterministic (a lock — works every time). The stakes of the action decide which one you must use.

1.4.2 The Enforcement Spectrum

Let's make 'the stakes decide' precise, because that's the actual decision rule the exam rewards. Every workflow requirement sits somewhere on a spectrum from 'nice if it happens' to 'must happen, always.'

At the low-stakes end — formatting, tone, style — a prompt is perfect. If the model occasionally formats a date differently, nobody is harmed; the cost of an occasional miss is trivial, and few-shot examples nudge it into line. At the high-stakes end — anything financial, security-related, or compliance-bound — a prompt is a liability, because its non-zero failure rate, multiplied across thousands of transactions, guarantees some failures, and each one matters. There, you need code that cannot be talked out of doing its job.

So the rule is simple and worth memorising: financial, security, or compliance requirement → programmatic enforcement; low-stakes formatting or style → a prompt is fine. The reason a prompt can't carry the high-stakes end is fundamental, not fixable by better wording: the model is probabilistic by nature, so 'follow this instruction' always carries some chance of not following it. No amount of prompt-polishing turns a probability into a guarantee.

RequirementStakesUse
Consistent date formattingLow — a miss is harmlessPrompt / few-shot examples
Friendly, on-brand toneLowPrompt
Verify identity before a refundHigh — money at riskProgrammatic enforcement
Run an AML check before transferring fundsHigh — complianceProgrammatic enforcement

Place the requirement on the spectrum by its stakes. Low-stakes → prompt; high-stakes → code. A prompt's non-zero failure rate is the dividing line.

1.4.2 — Key Concept

Decision rule: financial / security / compliance → programmatic enforcement (deterministic); low-stakes formatting / style → a prompt is sufficient. A prompt can never be made into a guarantee because the model is probabilistic by nature.

1.4.3 The Prerequisite Gate — Building the Lock

What does a 'lock' actually look like in code? The most important pattern is the prerequisite gate: code that refuses to let a downstream action run until a required earlier step has genuinely completed. It's the agent equivalent of a checkout that won't let you pay until you've entered a shipping address.

model proposesprocess_refundthe gate (your code)ID verified this session?yes → allowrefund runsno → blockimpossible to bypass

A prerequisite gate sits in your code's 'whether to run it' step: the model proposes process_refund, the gate checks the prerequisite, and only a verified ID lets it through.

Here's why it's bulletproof, and it connects straight back to Lesson 1.1. Recall the division of labour: the model only PROPOSES an action; your code decides whether to actually run it. A prerequisite gate lives in that 'whether to run it' step. When the model proposes process_refund, your gate checks: has get_customer returned a verified ID this session? If not, the gate refuses — and the model has no way around it, because the model never had the power to execute the tool in the first place. You can prompt the model however you like; the lock is in the code, not the conversation.

pythonA prerequisite gate: process_refund cannot run until get_customer has verified the ID. Because your code controls execution, no prompt can bypass it — it's a lock, not a sign.
verified_ids = set()

def on_tool_call(name, args):
    # Step 1 records that verification happened:
    if name == "get_customer" and args.get("verified"):
        verified_ids.add(args["customer_id"])
    # The gate: the downstream action is impossible until the prerequisite is met
    if name == "process_refund" and args.get("customer_id") not in verified_ids:
        return block("Refund blocked: customer identity not verified this session")
    return allow()

Contrast this with the tempting-but-wrong alternatives. A stronger system prompt? Still a sign. More few-shot examples showing verification first? Still a sign — you've made compliance more LIKELY, not certain. A routing classifier that picks which tools to enable? That changes which tools are AVAILABLE, not the ORDER they're called — it doesn't address sequencing at all. Only the gate gives a guarantee.

1.4.3 — Key Concept

A prerequisite gate blocks a downstream tool call until a prerequisite step completes (e.g. block process_refund until get_customer returns a verified ID). It works because your code — not the model — controls execution, so no prompt can bypass it.

1.4.4 Handling Requests With Several Problems at Once

Real users rarely send tidy single-issue requests. They send: "My order is three days late, I think I was double-charged, and I want to update my shipping address." A naive agent latches onto the first problem, resolves it, and forgets the other two — leaving the customer to come back twice more.

The pattern that handles this well has three beats. First, DECOMPOSE the message into its distinct concerns (late order, double charge, address change) — this is the same decompose instinct from multi-agent orchestration, applied within one conversation. Second, INVESTIGATE each concern, ideally in parallel, using the shared context of who this customer is. Third, SYNTHESISE one unified response that addresses all three, rather than three disjointed replies. The customer experiences a single, complete resolution.

1. Decompose3 distinct concernsinvestigate: late orderinvestigate: double chargeinvestigate: address3. Synthesiseone unified reply

A multi-concern request: decompose into distinct items, investigate each (in parallel, with shared customer context), then synthesise one unified resolution — not three fragmented replies.

ℹ️

1.4.4 — Key Concept

For a request bundling several concerns, decompose it into distinct items, investigate each in parallel using shared context, then synthesise a single unified resolution — rather than handling one and dropping the rest.

1.4.5 The Structured Handoff to a Human

Sometimes the right move is to stop and hand the problem to a human — a policy exception, a sensitive dispute, an explicit request for a person. The danger here is subtle: the human who takes over has NONE of the context. They can't see the conversation transcript, the tools the agent called, or what it figured out. To them, a bare 'escalating to a human' is almost useless — the customer has to re-explain everything from scratch, and the agent's work is wasted.

agenthas the full contextstructured handoffID · summary · root causerecommended actionpartial resultshumanstarts with no context

A structured handoff carries the full context across the boundary to a human who has none — just as a brief carries context to a blank subagent in Lesson 1.3.

This is the same problem as briefing a subagent in Lesson 1.3, just pointed at a person: the receiver starts blank, so you must hand them a complete, self-contained brief. A good structured handoff carries the customer ID and account, a concise summary of what was asked and tried, the root-cause analysis the agent reached, the concrete recommended action, and any partial results already gathered so the human doesn't redo them. Done well, the human picks up exactly where the agent left off.

  • Customer ID and the verified account in question — so the human isn't re-identifying anyone.
  • A concise summary of what the customer asked and what's already been tried.
  • The root-cause analysis — what the agent determined is actually wrong.
  • The concrete recommended action (e.g. a specific refund amount or the policy exception requested).
  • Any partial results already gathered, so the human continues rather than restarts.

1.4.5 — Key Concept

A human handoff transfers a problem to someone with NO prior context, so design it like a subagent brief: self-contained and structured — customer ID, summary, root cause, recommended action, and partial results — complete enough for the human to act without re-investigating.

1.4.6 The Exam Traps

Every 1.4 trap has the same shape: a question describes a high-stakes reliability problem, and three of the four answers offer a probabilistic fix dressed up to look thorough. The fourth — the deterministic one — is the answer. Train yourself to spot the lock among the signs.

The reliability problemTempting (✗ probabilistic)Correct (✓ deterministic)
Agent skips verification 12% of the timeStronger prompt; more few-shot examplesPrerequisite gate in code
Must block refunds over $500"Please don't exceed $500"Tool-call interception hook (Lesson 1.5)
Routing classifier 'fixes' tool orderingEnable only certain tools per request— wrong layer: that's availability, not ORDER
Consistent date formatting(this one IS fine as a prompt)— a prompt is correct here

For high-stakes requirements, prompts and few-shot examples are distractors and a routing classifier addresses the wrong layer (availability, not sequencing). Only code enforcement guarantees the outcome.

One distractor deserves a special callout because it's so common: the routing classifier. It sounds sophisticated, but it only decides which tools are AVAILABLE for a request — it never controls the ORDER in which the model calls them. So for a 'verify before refund' sequencing problem, a routing classifier is simply aimed at the wrong target. The prerequisite gate is.

⚠️

1.4.6 — Exam Trap

When a question describes a financial/compliance reliability failure (e.g. 'the agent skips verification X% of the time'), reject 'enhance the prompt', 'add few-shot examples', and 'add a routing classifier' (which addresses availability, not ordering). Choose the programmatic prerequisite gate — the only deterministic guarantee.

1.4.7 Put It Together: Build the Guarantees

You can now distinguish signs from locks, place a requirement on the enforcement spectrum, build a prerequisite gate, handle multi-concern requests, and design a structured handoff. The exercise drives home the central truth: prove to yourself that the prompt-only version eventually slips, and the gated version never does.

1.4.7 — Build Exercise (60 min)

Add enforcement to a support agent. (1) Build a prerequisite gate that blocks process_refund until get_customer returns a verified ID; then build a prompt-only version that merely INSTRUCTS verification first, run both many times, and watch the prompt-only one occasionally slip through. (2) Handle a multi-concern message by decomposing it, investigating each concern, and synthesising one reply. (3) Build a structured handoff summary (customer ID, summary, root cause, recommended action, partial results) emitted when the agent escalates — then confirm a colleague could act on it with no access to the transcript.

This lesson introduced enforcement in general — code that guarantees behaviour. The next lesson, 1.5, zooms into the most powerful enforcement mechanism Claude gives you: hooks — code that fires at precise moments in the loop to transform data or block actions.

ℹ️

Where this shows up on the exam

1.4 questions almost always pit a deterministic fix against three probabilistic ones for a high-stakes problem. If you can articulate why a prompt can never be a guarantee, and why a routing classifier addresses availability rather than ordering, the right answer jumps out.

Key Takeaways

  • Prompt-based guidance is a sign (probabilistic — works most of the time); programmatic enforcement is a lock (deterministic — works every time). The stakes of the action decide which you need.
  • Decision rule: financial / security / compliance → programmatic enforcement; low-stakes formatting / style → a prompt is sufficient. A prompt can't be a guarantee because the model is probabilistic.
  • A prerequisite gate blocks a downstream tool call until a prerequisite completes (block process_refund until get_customer returns a verified ID); it's bulletproof because your code, not the model, controls execution.
  • A stronger prompt or more few-shot examples only make compliance more LIKELY, not certain; a routing classifier changes which tools are AVAILABLE, not the ORDER they're called — wrong layer for sequencing.
  • Handle multi-concern requests by decomposing into distinct items, investigating each in parallel with shared context, then synthesising one unified resolution.
  • A human handoff goes to someone with no context, so make it a structured, self-contained brief: customer ID, summary, root-cause analysis, recommended action, and partial results.
  • Exam reflex: for a high-stakes reliability failure, reject prompt/few-shot/routing-classifier distractors and choose the programmatic prerequisite gate.

Check Your Understanding

Test what you learned in this lesson.

Q1.Production data shows your support agent skips get_customer in 12% of cases and calls lookup_order using only the customer's stated name, causing misidentified accounts and wrong refunds. What change most effectively fixes this?

Q2.Which workflow requirement is appropriately handled with prompt-based guidance rather than programmatic enforcement?

Q3.An agent must escalate a billing dispute to a human who cannot see the conversation. What makes the handoff effective?

Q4.Why is a routing classifier the wrong fix for an agent that calls tools in the wrong ORDER (e.g. refunding before verifying)?

Practice This Lesson