Courses/Claude Certified Architect — Foundations (CCA-F)/5.6 Information Provenance & Multi-Source Synthesis

Domain 5: Context Management & Reliability (15%)Lesson 30 of 30

5.6 Information Provenance & Multi-Source Synthesis

5.6.1 Where Did This Claim Come From?

The final lesson of the course closes the loop on a problem that runs through Domain 1 (subagent context passing) and Domain 5 (context management): when an agent combines information from many sources into one synthesized output, how do you keep track of WHERE each claim came from? Task Statement 5.6 is about provenance — preserving the source of every piece of information through synthesis — and handling the conflicts and uncertainty that arise when sources disagree.

The core danger is stated simply: attribution dies during summarization. Each summarization or synthesis step is a chance to lose the link between a claim and its source. A research subagent finds 'AI adoption rose 40% (State of AI 2024, p.12)'; the synthesis agent folds it into a paragraph as 'AI adoption is rising sharply' — and the citation is gone. Now the final report makes claims you can't trace or verify. It's like a game of telephone where the facts survive but the sources evaporate, leaving you unable to check anything. For any report meant to be trustworthy, that's fatal.

This lesson is about deliberately PRESERVING provenance through every synthesis step, and about three specific challenges: combining sources that conflict, handling time-sensitive data, and rendering different kinds of content appropriately. Let's start with the preservation mechanism.

Without deliberate preservation, synthesis strips attribution and claims become untraceable. A structured claim-source mapping carries source, excerpt, and date through every step so the report stays verifiable.

ℹ️

The one idea to hold onto

Attribution dies during summarization unless you deliberately preserve it. Use a structured claim-source mapping that synthesis agents carry through every step, so every claim in the final output remains traceable to its source.

5.6.2 The Structured Claim-Source Mapping

The mechanism for preserving provenance is a structured claim-source mapping: every finding is recorded not as bare text but as a structure that bundles the claim WITH its origin. The exam specifies five fields: the claim itself, the source URL, the document name, the relevant excerpt (the actual passage supporting the claim), and the publication date.

The key requirement is that downstream agents must PRESERVE and MERGE these mappings rather than flattening them into prose. When the synthesis agent combines findings, it keeps each claim attached to its source structure — so the final report can cite every statement. This is the exact same principle as the context-passing rule from Lesson 1.3: pass findings as STRUCTURED data (content + metadata together) so attribution survives every hop. There, it kept citations alive between subagents; here, it keeps them alive through synthesis. Structured-not-prose is the through-line.

Claim-source field	Purpose
Claim	The finding/statement itself
Source URL	Where it came from (verifiable link)
Document name	Human-readable source identification
Relevant excerpt	The actual supporting passage
Publication date	Enables temporal reasoning (see 5.6.4)

The five-field claim-source mapping. Synthesis agents must preserve and merge these — not flatten them to prose — so every claim stays traceable.

⭐

5.6.2 — Key Concept

Require subagents to output structured claim-source mappings (claim + source URL + document name + relevant excerpt + publication date) that downstream/synthesis agents PRESERVE and merge — never flatten to prose. This is the 1.3 structured-context-passing principle applied to synthesis.

5.6.3 When Sources Conflict

Synthesizing many sources inevitably surfaces CONFLICTS — two credible sources give different numbers for the same thing. The instinct is to RESOLVE the conflict: pick one value and present it as the answer. The exam says that's wrong. Arbitrarily selecting one value hides a real disagreement from the reader and may present a contested figure as settled fact.

The correct handling is to ANNOTATE both values with their attribution rather than choosing between them. Present 'Source A reports 12%, Source B reports 8%,' each with its source and date, and let the reader (or a human decision-maker) see the disagreement. At the document-analysis stage, include the conflicting values explicitly and flag the conflict; at synthesis, structure the report to DISTINGUISH well-established findings from contested ones, preserving each source's original characterization. Honesty about disagreement beats false precision — surfacing a conflict is more useful than silently picking a winner.

⭐

5.6.3 — Key Concept

When credible sources conflict, do NOT arbitrarily pick one value — annotate BOTH with their source attribution (and dates), and structure the report to distinguish well-established findings from contested ones. Surfacing the disagreement beats presenting a contested figure as settled fact.

5.6.4 Temporal Data and Content-Appropriate Rendering

Two more synthesis challenges. First, TEMPORAL awareness — and this is why publication date is one of the five mapping fields. Two sources giving different figures might not be in CONFLICT at all; they might be describing different points in TIME, making the difference a TREND rather than a contradiction. 'Adoption was 8% in 2023 and 12% in 2024' isn't a disagreement — it's growth. Without the dates, a synthesis agent could misread that as two sources contradicting each other. So require publication/collection dates in structured outputs, so temporal differences are interpreted correctly instead of being flagged as conflicts.

Second, CONTENT-APPROPRIATE rendering. Different kinds of information are best presented in different formats, and forcing everything into one uniform format degrades readability. Financial data belongs in TABLES (so figures align and compare), news belongs in PROSE (narrative flows), technical findings belong in structured LISTS (scannable steps/specs). A synthesis agent that renders everything as, say, uniform paragraphs makes the financial comparison hard to read and the technical specs hard to scan. Match the rendering to the content type.

Challenge	Right handling
Sources give different numbers	Check dates — may be a trend, not a conflict
Genuine conflict	Annotate both with attribution; don't pick one
Financial data	Render as tables
News / narrative	Render as prose
Technical findings	Render as structured lists

Temporal awareness (dates distinguish trends from conflicts) and content-appropriate rendering (tables/prose/lists by type) round out reliable multi-source synthesis.

ℹ️

5.6.4 — Key Concept

Require publication/collection DATES so temporal differences are read as TRENDS, not contradictions. And render content appropriately — financial data as tables, news as prose, technical findings as lists — rather than forcing everything into one uniform format.

5.6.5 The Exam Traps

The 5.6 traps test provenance preservation, conflict handling, temporal interpretation, and rendering. They echo the structured-context-passing of Domain 1.3.

•Letting attribution die. ✗ Synthesizing claims into prose without source mappings. ✓ Preserve structured claim-source mappings (claim + url + doc + excerpt + date) through synthesis.
•Picking a winner on conflict. ✗ Choosing the 'most recent' or one credible source's value and presenting it as fact. ✓ Annotate both with attribution; distinguish established from contested.
•Misreading trends as conflicts. ✗ Treating two different-dated figures as a contradiction. ✓ Use dates — it may be a trend, not a conflict.
•Uniform rendering. ✗ Converting everything to one format. ✓ Tables for financial, prose for news, lists for technical.

⚠️

5.6.5 — Exam Trap

✗ Paraphrasing claims without preserving source mappings; ✗ arbitrarily picking one value (or the most recent) when sources conflict; ✗ treating different-dated figures as a contradiction; ✗ uniform formatting for all content. ✓ Structured claim-source mappings, annotate-both-on-conflict, dates for temporal reasoning, content-appropriate rendering.

5.6.6 Put It Together: Synthesize With Provenance

You've reached the final exercise of the course. You now know how to preserve provenance, handle conflicts, reason about time, and render content appropriately. This exercise ties Domain 5 together — and connects back to the multi-agent research system from Domain 1.

✨

5.6.6 — Build Exercise (45 min)

(1) Have research subagents output structured claim-source mappings (claim, source URL, document name, excerpt, publication date) and verify the synthesis agent PRESERVES them so the final report cites every claim. (2) Feed two credible sources with different statistics for the same metric — confirm synthesis annotates BOTH with attribution rather than picking one. (3) Feed two figures from different years — confirm the dates let synthesis present a trend, not a contradiction. (4) Mix financial, news, and technical findings and confirm they're rendered as tables, prose, and lists respectively. (5) Structure the final report to distinguish well-established from contested findings.

That completes Domain 5 — and the entire CCA-F curriculum. Step back and see the whole arc: Domain 1 gave you the agentic foundation (loops, orchestration, hooks, sessions); Domain 2, tool and MCP design; Domain 3, configuring Claude Code; Domain 4, prompt engineering and structured output; and Domain 5, the reliability layer — context, escalation, error propagation, codebase exploration, human review, and now provenance — that makes everything else trustworthy in production. You're ready to architect with Claude.

ℹ️

Where this shows up on the exam

5.6 questions involve synthesizing multiple sources. Preserve structured claim-source mappings; annotate BOTH values on conflict (don't pick one); use dates to tell trends from contradictions; and render by content type. These echo the structured-context-passing of Domain 1.3.

Key Takeaways

✓Attribution dies during summarization unless deliberately preserved — each synthesis step risks severing a claim from its source, making the final report untraceable.
✓Use a structured claim-source mapping (claim + source URL + document name + relevant excerpt + publication date) that downstream/synthesis agents PRESERVE and merge, never flatten to prose.
✓This is the Domain 1.3 structured-context-passing principle applied to synthesis: pass content WITH its metadata so attribution survives every hop.
✓When credible sources conflict, do NOT arbitrarily pick one value — annotate BOTH with attribution and dates, and distinguish well-established findings from contested ones.
✓Require publication/collection DATES: two different-dated figures may be a TREND, not a contradiction — without dates, synthesis can misread growth as a conflict.
✓Render content appropriately by type — financial data as tables, news as prose, technical findings as lists — rather than forcing everything into one uniform format.
✓Structure reports to surface coverage and uncertainty honestly, preserving each source's original characterization and methodological context.

Check Your Understanding

Test what you learned in this lesson.

Q1.Two credible sources report different statistics for the same metric during synthesis. How should the agent handle the conflict?

Q2.A research report loses its citations: claims appear but you can't trace where they came from. What prevents this?

Q3.Two sources give 8% (dated 2023) and 12% (dated 2024) for the same metric. How should synthesis interpret this?

Q4.A synthesis output mixes financial figures, news, and technical specifications. How should they be rendered?

Practice This Lesson

5.5 Human Review & Confidence Calibration

Finished!

Back to Course

Practice what you learned