3.6 Integrating Claude Code into CI/CD
3.6.1 Claude Without a Human at the Keyboard
Everything so far assumed you're sitting at a terminal, talking to Claude interactively. But one of the most powerful uses of Claude Code is in a CI/CD pipeline — automatically reviewing every pull request, generating tests, posting feedback — with NO human at the keyboard. Task Statement 3.6 is about running Claude non-interactively in automation, and it carries one fact the exam tests more than almost any other in Domain 3.
Here's the problem that fact solves. By default, Claude Code is interactive — it expects a human to type, confirm, respond. Drop that default into a CI pipeline and disaster: the job starts, Claude waits for input that will never come, and the pipeline HANGS indefinitely until it times out. A robot can't press enter. You need a way to tell Claude 'there's no human here — process the prompt, print the result, and exit.'
That switch is the -p flag (also written --print). It's the answer to 'my CI job hangs waiting for input,' and it is THE most-tested fact in this domain. Run claude -p "review this PR" and Claude processes the prompt, writes the result to standard output, and exits — exactly what a pipeline needs. Burn this in: CI hangs → the answer is -p.
The interactive default hangs in CI because nothing types input. The -p / --print flag makes Claude process the prompt, print the result, and exit — the documented non-interactive mode.
The one idea to hold onto
The -p (or --print) flag runs Claude Code non-interactively: process the prompt, print the result, exit. It's THE answer to 'my CI job hangs waiting for input' — the most-tested fact in this domain.
3.6.2 Structured Output for Automation
Running non-interactively is step one; the second problem is FORMAT. In CI you usually want to DO something with Claude's output programmatically — post it as inline PR comments, fail the build on certain findings, feed it to another tool. Free-form prose is hard for a script to parse reliably. You need machine-readable output.
Two flags handle this. --output-format json makes Claude emit structured JSON instead of prose, and --json-schema lets you specify the EXACT schema that output must conform to — so your pipeline can reliably extract 'file', 'line', 'severity', 'message' from each finding and turn them into PR comments. This is the CI cousin of the structured-output techniques you'll see in Domain 4.3 (tool_use with JSON schemas): the goal is the same — guaranteed, parseable structure — applied here at the CLI level. A couple of related flags round out the CI toolkit: --max-turns caps the agentic turns so a job can't run away, and --max-budget-usd caps spend.
| CLI flag | Purpose |
|---|---|
| -p / --print | Non-interactive: process, print, exit (the CI essential) |
| --output-format json | Emit machine-parseable JSON instead of prose |
| --json-schema '{...}' | Force output to match a specific JSON schema |
| --max-turns N | Cap agentic turns so a CI job can't run away |
The CI flag toolkit. -p makes it run at all; --output-format json + --json-schema make the result parseable for inline PR comments.
3.6.2 — Key Concept
Use --output-format json with --json-schema to produce machine-parseable, schema-conforming output that a pipeline can post as inline PR comments. --max-turns and --max-budget-usd bound a non-interactive run. Note the fake distractors: CLAUDE_HEADLESS and --batch are not real.
3.6.3 Why a Fresh Instance Reviews Better
Here's a subtle but heavily-tested idea about CI code review. Suppose Claude just GENERATED some code, and you want it reviewed. The tempting shortcut is to ask the SAME session to review its own work. Don't — and the reason connects to context.
A session that generated code carries its own GENERATION REASONING in context — all the assumptions and justifications it made while writing. When that same session reviews the code, it tends to CONFIRM its earlier decisions rather than challenge them; it's already convinced itself the code is right. It's like proofreading your own essay minutes after writing it — you read what you MEANT to write, not what's actually there. The fix is session context isolation: spin up a SEPARATE claude -p invocation with no prior context to do the review. A fresh instance, unburdened by the generation reasoning, is far more likely to catch real problems. (This is the CI form of the independent-review idea you'll meet again in Domain 4.6.)
Two more CI review practices follow from the same context-aware thinking. INCREMENTAL review: when re-running a review after new commits, pass the PRIOR findings and instruct Claude to report only NEW or still-unaddressed issues — otherwise it re-flags everything and floods the PR with duplicate comments. And CLAUDE.md IN CI: provide the project's testing standards, fixtures, and review criteria via CLAUDE.md (and include existing tests) so generated tests follow conventions and don't duplicate coverage.
3.6.3 — Key Concept
A session that generated code is weaker at reviewing it (it retains and confirms its generation reasoning) — use a SEPARATE claude -p invocation with no prior context for independent review. For re-runs, pass prior findings and report only new/unaddressed issues to avoid duplicate comments.
3.6.4 Real-Time vs Batch in CI
One more CI decision, and it connects forward to Domain 4.5: should a given automated workflow use the synchronous API (real-time) or the Message Batches API (cheaper but slower)? The deciding factor is whether the workflow BLOCKS someone.
A pre-merge check BLOCKS a developer — they're waiting to merge, so it must return promptly; use the real-time synchronous API even though it costs more. An overnight technical-debt report or a weekly audit blocks NObody — it can take hours; use the Message Batches API for its ~50% cost saving (it allows up to a 24-hour window with no latency guarantee). The rule: blocking workflow → real-time; latency-tolerant → batch. (Domain 4.5 covers the Batch API in depth; here the point is matching API to whether a human is waiting.)
| CI workflow | Blocks someone? | Use |
|---|---|---|
| Pre-merge PR check | Yes — dev waits to merge | Real-time (synchronous) API |
| Overnight tech-debt report | No | Batch API (50% cheaper, ≤24h) |
| Weekly codebase audit | No | Batch API |
Match the API to whether a human is waiting: blocking → real-time; latency-tolerant overnight/weekly jobs → batch for the cost saving.
3.6.4 — Key Concept
In CI, use the real-time synchronous API for BLOCKING workflows (pre-merge checks where a developer waits) and the Message Batches API for latency-tolerant ones (overnight/weekly reports) to capture its ~50% cost saving. Match the API to whether someone is waiting.
3.6.5 The Exam Traps
The 3.6 traps cluster around the -p flag (and its fake alternatives), self-review, and the real-time-vs-batch choice. The -p one is nearly guaranteed to appear.
- •Fake non-interactive flags. ✗ CLAUDE_HEADLESS=true, --batch, or a stdin redirect to fix a hanging CI job. ✓ The documented answer is -p (or --print).
- •Self-review in the same session. ✗ Asking the generating session to review its own code. ✓ Use a separate claude -p invocation with no prior context.
- •Duplicate review comments. ✗ Re-running a full review on new commits and re-flagging everything. ✓ Pass prior findings; report only new/unaddressed issues.
- •Batch for blocking work. ✗ Using the Batch API for a pre-merge check (devs wait up to 24h). ✓ Real-time for blocking; batch only for latency-tolerant jobs.
3.6.5 — Exam Trap
For a hanging CI job the answer is ALWAYS -p / --print — CLAUDE_HEADLESS and --batch are fake. Use a separate claude -p instance (not the same session) for independent review; pass prior findings on re-runs to avoid duplicate comments; and use real-time (not batch) for blocking pre-merge checks.
3.6.6 Put It Together: Build a CI Review Pipeline
You now know how to run Claude non-interactively (-p), get parseable output (--output-format json + --json-schema), why independent review beats self-review, and when to use real-time vs batch. The exercise builds a working CI review step that exercises all of it.
3.6.6 — Build Exercise (45 min)
(1) Write a CI script that runs claude -p so it never hangs waiting for input. (2) Use --output-format json with --json-schema to emit structured findings, and post them as inline PR comments. (3) Add a CLAUDE.md section with testing standards and include existing tests so generated tests follow conventions and avoid duplicates. (4) Run the review as a SEPARATE claude -p instance from any generation step, and compare its findings to a same-session self-review. (5) Implement incremental review: on a second run, pass prior findings and instruct it to report only new/unaddressed issues. (6) Decide real-time vs batch for a pre-merge check (real-time) vs a nightly report (batch).
That completes Domain 3 — you can now configure Claude Code's knowledge (CLAUDE.md, skills, rules), choose how it approaches tasks (plan vs direct), iterate effectively (3.5), and run it in automation (3.6). Domain 4 turns to the craft of the prompts themselves — explicit criteria, few-shot, structured output, validation, batching, and multi-pass review.
Where this shows up on the exam
3.6 questions center on the -p flag for hanging CI jobs (memorize it), structured JSON output for PR comments, independent vs self-review, and real-time vs batch by whether a human waits. The -p question is nearly guaranteed.
Key Takeaways
- ✓The -p (or --print) flag runs Claude Code non-interactively — process, print, exit — and is THE answer to a CI job that hangs waiting for input (the most-tested Domain 3 fact).
- ✓Fake distractors to reject: CLAUDE_HEADLESS=true and --batch are not real flags; a stdin redirect is a workaround, not the documented mode.
- ✓Use --output-format json with --json-schema for machine-parseable, schema-conforming output that a pipeline can post as inline PR comments; --max-turns and --max-budget-usd bound a run.
- ✓A session that generated code is weaker at reviewing it (it confirms its own generation reasoning) — use a SEPARATE claude -p invocation with no prior context for independent review.
- ✓For incremental review on re-runs, pass prior findings and report only new/unaddressed issues to avoid duplicate PR comments; provide CLAUDE.md standards + existing tests for better test generation.
- ✓Match the API to whether a human waits: real-time synchronous API for blocking pre-merge checks; Message Batches API (≈50% cheaper, ≤24h) for latency-tolerant overnight/weekly jobs.
- ✓These CI ideas echo Domain 4: structured output (4.3), batch processing (4.5), and independent/multi-pass review (4.6).
Check Your Understanding
Test what you learned in this lesson.
Q1.Your pipeline runs claude "Analyze this pull request for security issues" but the job hangs indefinitely, waiting for interactive input. What's the correct fix?
Q2.You want Claude's CI review output to be posted as inline PR comments automatically. What's the best way to get reliably parseable output?
Q3.You want Claude to review code it just generated. Why use a separate claude -p invocation instead of the same session?
Q4.Your team wants to move two CI workflows to the Message Batches API for its 50% cost saving: (1) a blocking pre-merge check developers wait on, and (2) an overnight tech-debt report. How should you evaluate this?
Practice This Lesson