Home Practical Test Docs PDF

PART II: EXAM DOMAIN NOTES


Domain 1: Agent Architecture and Orchestration (27%)

1.1 Designing Agentic Loops for Autonomous Task Execution

Key knowledge:

Key skills:

1.2 Orchestrating Multi-agent Systems (Coordinator–Subagent)

Key knowledge:

Key skills:

1.3 Configuring Subagent Calls, Context Passing, and Spawning

Key knowledge:

Key skills:

1.4 Implementing Multi-step Workflows with Enforcement and Handoff Patterns

Key knowledge:

Key skills:

1.5 Agent SDK Hooks for Intercepting Tool Calls and Normalizing Data

Key knowledge:

Key skills:

1.6 Task Decomposition Strategies for Complex Workflows

Key knowledge:

Key skills:

1.7 Session State, Resuming, and Forking

Key knowledge:

Key skills:


Domain 2: Tool Design and MCP Integration (18%)

2.1 Designing Tool Interfaces with Clear Descriptions

Key knowledge:

Key skills:

2.2 Implementing Structured Error Responses for MCP Tools

Key knowledge:

Key skills:

2.3 Allocating Tools Across Agents and Configuring tool_choice

Key knowledge:

Key skills:

2.4 Integrating MCP Servers into Claude Code and Agent Workflows

Key knowledge:

Key skills:

2.5 Selecting and Applying Built-in Tools (Read, Write, Edit, Bash, Grep, Glob)

Key knowledge:

Key skills:


Domain 3: Claude Code Configuration and Workflows (20%)

3.1 Configuring CLAUDE.md with Hierarchy, Scope, and Modular Organization

Key knowledge:

Key skills:

3.2 Creating and Configuring Custom Slash Commands and Skills

Key knowledge:

Key skills:

3.3 Using Path-specific Rules for Conditional Convention Loading

Key knowledge:

Key skills:

3.4 Deciding When to Use Planning Mode vs Direct Execution

Key knowledge:

Key skills:

3.5 Iterative Refinement for Progressive Improvement

Key knowledge:

Key skills:

3.6 Integrating Claude Code into CI/CD Pipelines

Key knowledge:

Key skills:


Domain 4: Prompt Engineering and Structured Output (20%)

4.1 Designing Prompts with Explicit Criteria to Improve Accuracy

Key knowledge:

Key skills:

4.2 Using Few-shot Prompting to Improve Output Consistency

Key knowledge:

Key skills:

4.3 Enforcing Structured Output with tool_use and JSON Schemas

Key knowledge:

Key skills:

4.4 Implementing Validation, Retries, and Feedback Loops for Extraction Quality

Key knowledge:

Key skills:

4.5 Designing Efficient Batch Processing Strategies

Key knowledge:

Key skills:

4.6 Designing Multi-instance and Multi-pass Review Architectures

Key knowledge:

Key skills:


Domain 5: Context Management and Reliability (15%)

5.1 Managing Conversation Context to Preserve Critical Information

Key knowledge:

Key skills:

5.2 Designing Effective Escalation Patterns and Resolving Ambiguity

Key knowledge:

Key skills:

5.3 Implementing Error Propagation Strategies in Multi-agent Systems

Key knowledge:

Key skills:

5.4 Managing Context Efficiently When Investigating Large Codebases

Key knowledge:

Key skills:

5.5 Designing Workflows with Human Oversight and Confidence Calibration

Key knowledge:

Key skills:

5.6 Preserving Provenance and Handling Uncertainty in Multi-source Synthesis

Key knowledge:

Key skills:


Examples of Exam Questions with Explanations

Question 1 (Scenario: Customer Support Agent)

Situation: Data shows that in 12% of cases the agent skips get_customer and calls lookup_order using only the customer’s name, which leads to incorrect refunds.

Which change is most effective?

Why A: When critical business logic requires a specific tool sequence, software provides deterministic guarantees that prompt-based approaches (B, C) cannot. D addresses availability, not tool ordering.


Question 2 (Scenario: Customer Support Agent)

Situation: The agent often calls get_customer instead of lookup_order for order-related questions. Tool descriptions are minimal and similar.

What is the first step?

Why B: Tool descriptions are the model’s primary selection mechanism. This is the lowest-effort, highest-impact fix. A adds tokens without addressing the root cause. C is overengineering. D requires more effort than justified.


Question 3 (Scenario: Customer Support Agent)

Situation: The agent resolves only 55% of issues with a target of 80%. It escalates simple cases and tries to handle complex policy exceptions autonomously.

How do you improve calibration?

Why A: It directly addresses the root cause—unclear decision boundaries. B is unreliable (the model can be confidently wrong). C is overengineering. D solves a different problem (mood != complexity).


Question 4 (Scenario: Code Generation with Claude Code)

Situation: You need a custom /review command for standard code review that is available to the whole team when they clone the repository.

Where should you create the command file?

Why A: Project commands stored in .claude/commands/ are version-controlled and automatically available to everyone. B is for personal commands. C is for instructions, not command definitions. D does not exist.


Question 5 (Scenario: Code Generation with Claude Code)

Situation: You need to restructure a monolith into microservices (dozens of files, service-boundary decisions).

What approach should you use?

Why A: Planning mode is designed for large changes, multiple possible approaches, and architectural decisions. B risks expensive rework. C assumes you already know the structure. D is reactive.


Question 6 (Scenario: Code Generation with Claude Code)

Situation: A codebase has different conventions across areas (React, API, database). Tests are co-located with code. You want conventions to be applied automatically.

What approach should you use?

Why A: .claude/rules/ with glob patterns (e.g., **/*.test.tsx) enables automatic convention application based on file paths—ideal for tests spread across the codebase. B relies on model inference. C is manual/on-demand. D does not work well when relevant files are in many directories.


Question 7 (Scenario: Multi-agent Research System)

Situation: The system researches “AI impact on creative industries,” but reports cover only visual art. The coordinator decomposed the topic into: “AI in digital art,” “AI in graphic design,” “AI in photography.”

What’s the cause?

Why B: The logs show the coordinator decomposed “creative industries” only into visual subtopics, completely missing music, literature, and film. Subagents executed correctly—the issue is what they were assigned.


Question 8 (Scenario: Multi-agent Research System)

Situation: A web-search subagent times out while researching a complex topic. You need to design how error information is passed back to the coordinator.

Which error propagation approach best enables intelligent recovery?

Why A: Structured error context gives the coordinator what it needs to decide whether to retry with a modified query, try an alternative approach, or continue with partial results. B hides context behind a generic status. C masks failure as success. D aborts the entire workflow unnecessarily.


Question 9 (Scenario: Multi-agent Research System)

Situation: The synthesis agent often needs to verify specific claims while merging results. Currently, when verification is needed, the synthesis agent hands control back to the coordinator, which calls the web-search agent and then re-runs synthesis with the new results. This adds 2–3 extra round trips per task and increases latency by 40%. Your assessment shows that 85% of these checks are simple fact checks (dates, names, statistics), while 15% require deeper investigation.

How do you reduce overhead while maintaining reliability?

Why A: This applies the principle of least privilege: the synthesis agent gets exactly what it needs for the 85% common case (simple fact checks) while preserving the coordinator-mediated path for complex investigations. B introduces blocking dependencies (later synthesis steps may depend on earlier verified facts). C breaks separation of responsibilities. D relies on speculative caching that cannot reliably predict needs.


Question 10 (Scenario: Claude Code for CI)

Situation: A pipeline runs claude "Analyze this pull request for security issues", but hangs waiting for interactive input.

What is the correct approach?

Why A: -p (or --print) is the documented way to run Claude Code in non-interactive mode. It processes the prompt, prints to stdout, and exits. The other options are either non-existent features or Unix workarounds.


Question 11 (Scenario: Claude Code for CI)

Situation: The team wants to reduce API cost for automated analysis. Claude currently serves two workflows in real time: (1) a blocking pre-merge check that must complete before developers can merge a PR, and (2) a tech-debt report generated overnight for morning review. A manager proposes moving both to the Message Batches API to save 50%.

How should you evaluate this proposal?

Why A: The Message Batches API saves 50%, but processing time can be up to 24 hours with no guaranteed latency SLA. That makes it unsuitable for blocking pre-merge checks where developers are waiting, but ideal for overnight batch workloads like tech-debt reports.


Question 12 (Scenario: Multi-file Code Review)

Situation: A pull request changes 14 files in an inventory tracking module. A single-pass review of all files produces inconsistent results: detailed comments for some files but superficial ones for others, missed obvious bugs, and contradictory feedback (a pattern is flagged as problematic in one file but approved in identical code in another file).

How should you restructure the review?

Why A: Focused passes directly address the root cause—attention dilution when processing many files at once. Per-file analysis ensures consistent depth, and a separate integration pass catches cross-file issues. B shifts burden to developers without improving the system. C is a misconception: larger context does not fix attention quality. D suppresses real bugs by requiring consensus across inconsistent detections.


Practice Test

60 questions across 4 scenarios. Format and difficulty match the real exam.

Alternatively, you can practice these questions in an exam-like HTML file: Practical Test (EN)

Scenario: Multi-agent Research System


Question 1 (Scenario: Multi-agent Research System)

Situation: A document analysis agent discovers that two credible sources contain directly contradictory statistics for a key metric: a government report states 40% growth, while an industry analysis states 12%. Both sources look credible, and the discrepancy could materially affect the research conclusions. How should the document analysis agent handle this situation most effectively?

Which approach is most effective?

Why D: This approach preserves separation of responsibilities: the analysis agent completes its core work without blocking, preserves both conflicting values with clear attribution, and correctly passes reconciliation to the coordinator, which has broader context.


Question 2 (Scenario: Multi-agent Research System)

Situation: The web-search and document-analysis agents have completed their tasks and returned results to the coordinator. What is the next step for creating an integrated research report?

Which next step is most appropriate?

Why C: In a coordinator–subagent architecture, the coordinator forwards both result sets to the synthesis agent for centralized integration, preserving control and ensuring high-quality merging.


Question 3 (Scenario: Multi-agent Research System)

Situation: A document analysis subagent frequently fails when processing PDF files: some have corrupted sections that trigger parsing exceptions, others are password-protected, and sometimes the parsing library hangs on large files. Currently, any exception immediately terminates the subagent and returns an error to the coordinator, which must decide whether to retry, skip, or fail the whole task. This causes excessive coordinator involvement in routine error handling. What architectural improvement is most effective?

Which improvement is most effective?

Why D: Handle errors at the lowest level capable of resolving them. Local recovery reduces coordinator workload while still escalating truly unrecoverable issues with full context and partial progress.


Question 4 (Scenario: Multi-agent Research System)

Situation: After running the system on “AI impact on creative industries,” you observe that every subagent completes successfully: the web-search agent finds relevant articles, the document analysis agent summarizes them correctly, and the synthesis agent produces coherent text. However, final reports cover only visual art and completely miss music, literature, and film. In the coordinator logs, you see it decomposed the topic into three subtasks: “AI in digital art,” “AI in graphic design,” and “AI in photography.” What is the most likely root cause?

What is the most likely root cause?

Why C: The coordinator decomposed a broad topic only into visual-art subtasks, missing music, literature, and film entirely. Since subagents executed their assignments correctly, the narrow decomposition is the obvious root cause.


Question 5 (Scenario: Multi-agent Research System)

Situation: The web-search subagent returns results for only 3 of 5 requested source categories (competitor sites and industry reports succeed, but news archives and social feeds time out). The document analysis subagent successfully processes all provided documents. The synthesis subagent must produce a summary from mixed-quality upstream inputs. Which error-propagation strategy is most effective?

Which error-propagation strategy is most effective?

Why D: Coverage annotations implement graceful degradation with transparency, preserving value from completed work while propagating uncertainty to enable informed decisions about confidence.


Question 6 (Scenario: Multi-agent Research System)

Situation: The document analysis subagent encounters a corrupted PDF file that it cannot parse. When designing the system’s error handling, what is the most effective way to handle this failure?

Which approach is most effective?

Why A: Returning an error with context to the coordinator is the most effective approach because it lets the coordinator make an informed decision—skip the file, try an alternative parsing method, or notify the user—while maintaining visibility into the failure.


Question 7 (Scenario: Multi-agent Research System)

Situation: Production logs show a persistent pattern: requests like “analyze the uploaded quarterly report” are routed to the web-search agent 45% of the time instead of the document analysis agent. Reviewing tool definitions, you find that the web-search agent has a tool analyze_content described as “analyzes content and extracts key information,” while the document analysis agent has a tool analyze_document described as “analyzes documents and extracts key information.” How should you fix the misrouting problem?

How should you fix the misrouting problem?

Why B: Renaming the web-search tool to extract_web_results and updating its description to explicitly reference web search and URLs directly removes the root cause by eliminating semantic overlap between the two tool names and descriptions. This makes each tool’s purpose unambiguous, enabling the coordinator to reliably distinguish document analysis from web search.


Question 8 (Scenario: Multi-agent Research System)

Situation: A colleague proposes that the document analysis agent should send its results directly to the synthesis agent, bypassing the coordinator. What is the main advantage of keeping the coordinator as the central hub for all communication between subagents?

What is the main advantage of keeping the coordinator as the central hub?

Why A: The coordinator pattern provides centralized visibility into all interactions, uniform error handling across the system, and fine-grained control over what information each subagent receives—these are the primary advantages of a star-shaped communication topology.


Question 9 (Scenario: Multi-agent Research System)

Situation: The web-search subagent times out while researching a complex topic. You need to design how information about this failure is returned to the coordinator. Which error-propagation approach best enables intelligent recovery?

Which error-propagation approach best enables intelligent recovery?

Why A: Returning structured error context—including failure type, executed query, partial results, and alternative approaches—gives the coordinator everything needed to make intelligent recovery decisions (e.g., retry with a modified query or continue with partial results). It preserves maximum context for informed coordination-level decision-making.


Question 10 (Scenario: Multi-agent Research System)

Situation: In your system design, you gave the document analysis agent access to a general-purpose tool fetch_url so it could download documents by URL. Production logs show this agent now frequently downloads search engine results pages to perform ad hoc web search—behavior that should be routed through the web-search agent—causing inconsistent results. Which fix is most effective?

Which fix is most effective?

Why A: Replacing a general-purpose tool with a document-specific tool that validates URLs against document formats fixes the root cause by constraining capability at the interface level. This follows the principle of least privilege, making undesired search behavior impossible rather than merely discouraged.


Question 11 (Scenario: Multi-agent Research System)

Situation: While researching a broad topic, you observe that the web-search agent and the document analysis agent investigate the same subtopics, leading to substantial duplication in their outputs. Token usage nearly doubles without a proportional increase in research breadth or depth. What is the most effective way to address this?

What is the most effective way to address this?

Why B: Having the coordinator explicitly partition the research space before delegating is most effective because it addresses the root cause—unclear task boundaries—before any work begins. It preserves parallelism while preventing duplicated effort and wasted tokens.


Question 12 (Scenario: Multi-agent Research System)

Situation: During research, the web-search subagent queries three source categories with different outcomes: academic databases return 15 relevant papers, industry reports return “0 results,” and patent databases return “Connection timeout.” When designing error propagation to the coordinator, which approach enables the best recovery decisions?

Which approach enables the best recovery decisions?

Why D: A timeout (access failure) and “0 results” (valid empty result) are semantically different outcomes requiring different responses. Distinguishing them allows the coordinator to retry the patent database while accepting the industry reports “0 results” as a valid, informative finding.


Question 13 (Scenario: Multi-agent Research System)

Situation: Production monitoring shows inconsistent synthesis quality. When aggregated results are ~75K tokens, the synthesis agent reliably cites information from the first 15K tokens (web-search headlines/snippets) and the last 10K tokens (document analysis conclusions), but often misses critical findings in the middle 50K tokens—even when they directly answer the research question. How should you restructure the aggregated input?

How should you restructure the aggregated input?

Why C: Putting a key-findings summary at the start leverages primacy effects so critical information sits in the most reliably processed position. Adding explicit section headings throughout helps the model navigate and attend to mid-input content, directly mitigating the “lost in the middle” phenomenon.


Question 14 (Scenario: Multi-agent Research System)

Situation: In testing, the combined output of the web-search agent (85K tokens including page content) and the document analysis agent (70K tokens including chains of thought) totals 155K tokens, but the synthesis agent performs best with inputs under 50K tokens. Which solution is most effective?

Which solution is most effective?

Why A: Modifying upstream agents to return structured data fixes the root cause by reducing token volume at the source while preserving essential information. It avoids passing bulky page content and reasoning traces that inflate tokens without improving the synthesis step.


Question 15 (Scenario: Multi-agent Research System)

Situation: In testing, you observe that the synthesis agent often needs to verify specific claims while merging results. Currently, when verification is needed, the synthesis agent returns control to the coordinator, which calls the web-search agent and then re-invokes synthesis with the results. This adds 2–3 extra loops per task and increases latency by 40%. Your assessment shows 85% of these verifications are simple fact checks (dates, names, stats) and 15% require deeper research. Which approach most effectively reduces overhead while preserving system reliability?

Which approach is most effective?

Why D: A limited-scope fact-verification tool lets the synthesis agent handle 85% of simple checks directly, eliminating most loops, while preserving the coordinator delegation path for the 15% of complex verifications. This applies least privilege while significantly reducing latency.


Scenario: Claude Code for Continuous Integration


Question 16 (Scenario: Claude Code for Continuous Integration)

Situation: Your CI pipeline runs the Claude Code CLI (in --print mode) using CLAUDE.md to provide project context for code review, and developers generally find the reviews substantive. However, they report that integrating findings into the workflow is difficult—Claude outputs narrative paragraphs that must be manually copied into PR comments. The team wants to automatically post each finding as a separate inline PR comment at the relevant place in code, which requires structured data with file path, line number, severity level, and suggested fix. Which approach is most effective?

Which approach is most effective?

Why B: Using --output-format json with --json-schema enforces structured output at the CLI level, guaranteeing well-formed JSON with the required fields (file path, line number, severity, suggested fix) that can be reliably parsed and posted as inline PR comments via the GitHub API. It leverages built-in CLI capabilities designed specifically for structured output.


Question 17 (Scenario: Claude Code for Continuous Integration)

Situation: Your team uses Claude Code for generating code suggestions, but you notice a pattern: non-obvious issues—performance optimizations that break edge cases, cleanups that unexpectedly change behavior—are only caught when another team member reviews the PR. Claude’s reasoning during generation shows it considered these cases but concluded its approach was correct. Which approach directly addresses the root cause of this self-check limitation?

Which approach directly addresses the root cause?

Why A: A second independent Claude Code instance without access to the generator’s reasoning directly addresses the root cause by avoiding confirmation bias. This “fresh eyes” perspective mirrors human peer review, where another reviewer catches issues the author rationalized.


Question 18 (Scenario: Claude Code for Continuous Integration)

Situation: Your code review component is iterative: Claude analyzes the changed file, then may request related files (imports, base classes, tests) via tool calls to understand context before providing final feedback. Your application defines a tool that lets Claude request file contents; Claude calls the tool, gets results, and continues analysis. You’re evaluating batch processing to reduce API cost. What is the primary technical limitation when considering batch processing for this workflow?

What is the primary technical limitation?

Why B: A “fire-and-forget” asynchronous Batch API model has no mechanism to intercept a tool call during a request, execute the tool, and return results for Claude to continue analysis. This is fundamentally incompatible with iterative tool-calling workflows that require multiple tool request/response rounds within a single logical interaction.


Question 19 (Scenario: Claude Code for Continuous Integration)

Situation: Your CI/CD system runs three Claude-based analyses: (1) fast style checks on every PR that block merging until completion, (2) comprehensive weekly security audits of the entire codebase, and (3) nightly test-case generation for recently changed modules. The Message Batches API offers 50% savings but processing can take up to 24 hours. You want to optimize API cost while maintaining an acceptable developer experience. Which combination correctly matches each task to an API approach?

Which combination is correct?

Why B: PR style checks block developers and require immediate responses via synchronous calls, while weekly security audits and nightly test generation are scheduled tasks with flexible deadlines that can tolerate up to a 24-hour batch window—capturing 50% savings for both.


Question 20 (Scenario: Claude Code for Continuous Integration)

Situation: Your automated reviews find real issues, but developers report the feedback is not actionable. Findings include phrases like “complex ticket routing logic” or “potential null pointer” without specifying what exactly to change. When you add detailed instructions like “always include concrete fix suggestions,” the model still produces inconsistent output—sometimes detailed, sometimes vague. Which prompting technique most reliably produces consistently actionable feedback?

Which prompting technique is most reliable?

Why D: Few-shot examples are the most effective technique for achieving consistent output format when instructions alone produce variable results. Providing 3–4 examples that show the exact desired structure (issue, location, concrete fix) gives the model a concrete pattern to follow, which is more reliable than abstract instructions.


Question 21 (Scenario: Claude Code for Continuous Integration)

Situation: Your CI pipeline includes two Claude-based code review modes: a pre-merge-commit hook that blocks PR merge until completion, and a “deep analysis” that runs overnight, polls for batch completion, and posts detailed suggestions to the PR. You want to reduce API cost using the Message Batches API, which offers 50% savings but requires polling and can take up to 24 hours. Which mode should use batch processing?

Which mode should use batch processing?

Why B: Deep analysis is an ideal candidate for batch processing because it already runs overnight, tolerates delay, and uses a polling model before publishing results—matching the asynchronous, polling-based architecture of the Message Batches API while capturing 50% savings.


Question 22 (Scenario: Claude Code for Continuous Integration)

Situation: Your automated review analyzes comments and docstrings. The current prompt instructs Claude to “check that comments are accurate and up to date.” Findings often flag acceptable patterns (TODO markers, simple descriptions) while missing comments describing behavior the code no longer implements. What change addresses the root cause of this inconsistent analysis?

What change addresses the root cause?

Why D: Explicit criteria—flagging comments only when claimed behavior contradicts actual code behavior—directly addresses the root cause by replacing a vague instruction with a precise definition of what constitutes a problem. This reduces false positives on acceptable patterns and misses of truly misleading comments.


Question 23 (Scenario: Claude Code for Continuous Integration)

Situation: Your automated code review system shows inconsistent severity ratings—similar issues like null pointer risks are rated “critical” in some PRs but only “medium” in others. Developer surveys show growing distrust—many start dismissing findings without reading because “half are wrong.” High-false-positive categories erode trust in accurate categories. Which approach best restores developer trust while improving the system?

Which approach best restores developer trust?

Why A: Temporarily disabling high-false-positive categories immediately stops trust erosion by removing noisy findings that cause developers to dismiss everything, while preserving value from high-precision categories like security and correctness. It also creates space to improve prompts for problematic categories before re-enabling them.


Question 24 (Scenario: Claude Code for Continuous Integration)

Situation: Your automated review generates test-case suggestions for each PR. Reviewing a PR that adds course completion tracking, Claude suggests 10 test cases, but developer feedback shows that 6 duplicate scenarios already covered by the existing test suite. What change most effectively reduces duplicate suggestions?

What change is most effective?

Why A: Including the existing test file fixes the root cause of duplication: Claude can only avoid suggesting already-covered scenarios if it knows what tests already exist. This gives Claude the information needed to propose genuinely new, valuable tests.


Question 25 (Scenario: Claude Code for Continuous Integration)

Situation: After an initial automated review identifies 12 findings, a developer pushes new commits to address issues. Re-running review produces 8 findings, but developers report that 5 duplicate previous comments on code that was already fixed in the new commits. What is the most effective way to eliminate this redundant feedback while maintaining thoroughness?

What is the most effective way to eliminate redundant feedback?

Why D: Including prior review findings in context lets Claude distinguish new problems from those already addressed in recent commits. This preserves review thoroughness while using Claude’s reasoning to avoid redundant feedback on fixed code.


Question 26 (Scenario: Claude Code for Continuous Integration)

Situation: Your pipeline script runs claude "Analyze this pull request for security issues", but the job hangs indefinitely. Logs show Claude Code is waiting for interactive input. What is the correct approach to run Claude Code in an automated pipeline?

What is the correct approach?

Why B: The -p (or --print) flag is the documented way to run Claude Code non-interactively. It processes the prompt, prints the result to stdout, and exits without waiting for user input—ideal for CI/CD pipelines.


Question 27 (Scenario: Claude Code for Continuous Integration)

Situation: A pull request changes 14 files in an inventory tracking module. A single-pass review that analyzes all files together produces inconsistent results: detailed feedback on some files but shallow comments on others, missed obvious bugs, and contradictory feedback (a pattern is flagged in one file but identical code is approved in another file in the same PR). How should you restructure the review?

How should you restructure the review?

Why B: Focused per-file passes address the root cause—attention dilution—by ensuring consistent depth and reliable local issue detection. A separate integration-oriented pass then covers cross-file concerns such as dependency and data-flow interactions.


Question 28 (Scenario: Claude Code for Continuous Integration)

Situation: Your automated code review averages 15 findings per pull request, and developers report a 40% false-positive rate. The bottleneck is investigation time: developers must click into each finding to read Claude’s rationale before deciding whether to fix or dismiss it. Your CLAUDE.md already contains comprehensive rules for acceptable patterns, and stakeholders rejected any approach that filters findings before developers see them. What change best addresses investigation time?

What change best addresses investigation time?

Why A: Including rationale and confidence directly in each finding reduces investigation time by letting developers quickly triage without opening each finding. It satisfies the “no filtering” constraint because all findings remain visible while accelerating developer decision-making.


Question 29 (Scenario: Claude Code for Continuous Integration)

Situation: Analysis of your automated code review shows large differences in false-positive rates by finding category: security/correctness findings have 8% false positives, performance findings 18%, style/naming findings 52%, and documentation findings 48%. Developer surveys show growing distrust—many start dismissing findings without reading because “half are wrong.” High-false-positive categories erode trust in accurate categories. Which approach best restores developer trust while improving the system?

Which approach best restores developer trust?

Why A: Temporarily disabling high-false-positive categories immediately stops trust erosion by removing noisy findings that cause developers to dismiss everything, while preserving value from high-precision categories like security and correctness. It also creates space to improve prompts for problematic categories before re-enabling them.


Question 30 (Scenario: Claude Code for Continuous Integration)

Situation: Your team wants to reduce API costs for automated analysis. Currently, synchronous Claude calls support two workflows: (1) a blocking pre-merge check that must complete before developers can merge, and (2) a technical debt report generated overnight for review the next morning. Your manager proposes moving both to the Message Batches API to save 50%. How should you evaluate this proposal?

How should you evaluate this proposal?

Why C: Message Batches API processing can take up to 24 hours with no latency SLA, which is acceptable for overnight technical debt reports but unacceptable for blocking pre-merge checks where developers wait. This matches each workflow to the right API based on latency requirements.


Scenario: Code Generation with Claude Code


Question 31 (Scenario: Code Generation with Claude Code)

Situation: You asked Claude Code to implement a function that transforms API responses into an internal normalized format. After two iterations, the output structure still doesn’t match expectations—some fields are nested differently and timestamps are formatted incorrectly. You described requirements in prose, but Claude interprets them differently each time.

Which approach is most effective for the next iteration?

Why B: Concrete input-output examples remove ambiguity inherent in prose descriptions by showing Claude the exact expected transformation results. This directly addresses the root cause—misinterpretation of textual requirements—by providing unambiguous patterns for field nesting and timestamp formatting.


Question 32 (Scenario: Code Generation with Claude Code)

Situation: You need to add Slack as a new notification channel. The existing codebase has clear, established patterns for email, SMS, and push channels. However, Slack’s API offers fundamentally different integration approaches—incoming webhooks (simple, one-way), bot tokens (support delivery confirmation and programmatic control), or Slack Apps (two-way events, requires workspace approval). Your task says “add Slack support” without specifying integration method or requiring advanced features like delivery tracking.

How should you approach this task?

Why B: Slack integration has multiple valid approaches with significantly different architectural implications, and requirements are ambiguous. Planning mode lets you evaluate trade-offs among webhooks, bot tokens, and Slack Apps and align on an approach before implementation.


Question 33 (Scenario: Code Generation with Claude Code)

Situation: Your CLAUDE.md file has grown to 400+ lines containing coding standards, testing conventions, a detailed PR review checklist, deployment instructions, and database migration procedures. You want Claude to always follow coding standards and testing conventions, but apply PR review, deploy, and migration guidance only when doing those tasks.

Which restructuring approach is most effective?

Why D: CLAUDE.md content loads in every session, ensuring coding standards and testing conventions always apply, while Skills are invoked on demand when Claude detects trigger keywords—ideal for workflow-specific guidance like PR review, deployment, and migrations.


Question 34 (Scenario: Code Generation with Claude Code)

Situation: You’re tasked with restructuring your team’s monolithic application into microservices. This impacts changes across dozens of files and requires decisions about service boundaries and module dependencies.

Which approach should you choose?

Why A: Planning mode is the right strategy for complex architectural restructuring like splitting a monolith: it allows safe exploration and informed decisions about boundaries before committing to potentially expensive changes across many files.


Question 35 (Scenario: Code Generation with Claude Code)

Situation: Your team created a /analyze-codebase skill that performs deep code analysis—dependency scanning, test coverage counts, and code quality metrics. After running the command, team members report Claude becomes less responsive in the session and loses the context of the original task.

How do you most effectively fix this while keeping full analysis capabilities?

Why A: context: fork runs the analysis in an isolated subagent context so the large output does not pollute the main session’s context window and Claude does not lose track of the original task. It preserves full analysis capability while keeping the main session responsive.


Question 36 (Scenario: Code Generation with Claude Code)

Situation: Your team uses a /commit skill in .claude/skills/commit/SKILL.md. A developer wants to customize it for their personal workflow (different commit message format, extra checks) without affecting teammates.

What do you recommend?

Why C: Personal skills take precedence over project skills with the same name. A personal skill at ~/.claude/skills/commit/SKILL.md will override the team’s project skill, allowing the developer to customize their workflow while maintaining the familiar /commit command name for their personal use. This approach is better than option A because it preserves the original command name, improving the developer’s workflow without affecting teammates.


Question 37 (Scenario: Code Generation with Claude Code)

Situation: Your team has used Claude Code for months. Recently, three developers report Claude follows the guidance “always include comprehensive error handling,” but a fourth developer who just joined says Claude does not follow it. All four work in the same repo and have up-to-date code.

What is the most likely cause and fix?

Why A: If the guidance was added only to the original developers’ user-level configs and not to the project-level .claude/CLAUDE.md, new team members won’t receive it. Moving it to the project-level configuration ensures all current and future team members automatically get the guidance.


Question 38 (Scenario: Code Generation with Claude Code)

Situation: You find that including 2–3 full endpoint implementation examples as context significantly improves consistency when generating new API endpoints. However, this context is useful only when creating new endpoints—not when debugging, reviewing code, or other work in the API directory.

Which configuration approach is most effective?

Why D: A skill invoked on demand loads the example context only when generating new endpoints, not during unrelated tasks like debugging or review. This keeps the main context clean while preserving high-quality generation when needed.


Question 39 (Scenario: Code Generation with Claude Code)

Situation: Your team created a /migration skill that generates database migration files. It takes the migration name via $ARGUMENTS. In production you observe three issues: (1) developers often run the skill without arguments, causing poorly named files, (2) the skill sometimes uses database schema details from unrelated prior conversations, and (3) a developer accidentally ran destructive test cleanup when the skill had broad tool access.

Which configuration approach fixes all three problems?

Why B: This uses three separate configuration features to address each problem: argument-hint improves argument entry and reduces missing arguments, context: fork prevents context leakage from prior conversations, and allowed-tools constrains the skill to safe file-writing operations, preventing destructive actions.


Question 40 (Scenario: Code Generation with Claude Code)

Situation: Your codebase contains areas with different coding conventions: React components use functional style with hooks, API handlers use async/await with specific error handling, and database models follow the repository pattern. Test files are distributed across the codebase next to the code under test (e.g., Button.test.tsx next to Button.tsx), and you want all tests to follow the same conventions regardless of location.

What is the most supported way to ensure Claude automatically applies the correct conventions when generating code?

Why D: .claude/rules/ files with YAML frontmatter and glob patterns (e.g., **/*.test.tsx, src/api/**/*.ts) enable deterministic, path-based convention application regardless of directory structure. This is the most supported approach for cross-cutting patterns like distributed test files.


Question 41 (Scenario: Code Generation with Claude Code)

Situation: You want to create a custom slash command /review that runs your team’s standard code review checklist. It should be available to every developer when they clone or update the repository.

Where should you create the command file?

Why B: Putting custom slash commands under .claude/commands/ inside the project repository ensures they are version-controlled and automatically available to every developer who clones or updates the repo. This is the intended location for project-level custom commands in Claude Code.


Question 42 (Scenario: Code Generation with Claude Code)

Situation: Your team’s CLAUDE.md grew beyond 500 lines mixing TypeScript conventions, testing guidance, API patterns, and deployment procedures. Developers find it hard to locate and update the right sections.

What approach does Claude Code support to organize project-level instructions into focused topical modules?

Why B: Claude Code supports a .claude/rules/ directory where you can create separate Markdown files for topical guidance (e.g., testing.md, api-conventions.md), allowing teams to organize large instruction sets into focused, maintainable modules.


Question 43 (Scenario: Code Generation with Claude Code)

Situation: You create a custom skill /explore-alternatives that your team uses to brainstorm and evaluate implementation approaches before choosing one. Developers report that after running the skill, subsequent Claude responses are influenced by the alternatives discussion—sometimes referencing rejected approaches or retaining exploration context that interferes with actual implementation.

How should you most effectively configure this skill?

Why B: context: fork runs the skill in an isolated subagent context so exploration discussions do not pollute the main conversation history. This prevents rejected approaches and brainstorming context from influencing subsequent implementation work.


Question 44 (Scenario: Code Generation with Claude Code)

Situation: Your team wants to add a GitHub MCP server for searching PRs and checking CI status via Claude Code. Each of six developers has their own personal GitHub access token. You want consistent tooling across the team without committing credentials to version control.

Which configuration approach is most effective?

Why C: A project .mcp.json with environment variable substitution is idiomatic: it provides a single version-controlled source of truth for MCP configuration while letting each developer supply credentials via environment variables. Documenting the variable makes onboarding easy without committing secrets.


Question 45 (Scenario: Code Generation with Claude Code)

Situation: You’re adding error-handling wrappers around external API calls across a 120-file codebase. The work has three phases: (1) discover all call sites and patterns, (2) collaboratively design the error-handling approach, and (3) implement wrappers consistently. In Phase 1, Claude generates large output listing hundreds of call sites with context, quickly filling the context window before discovery finishes.

Which approach is most effective to complete the task while maintaining implementation consistency?

Why A: An Explore subagent isolates the verbose discovery output in a separate context and returns only a concise summary to the main conversation. This preserves the main context window for the collaborative design and consistent implementation phases where retained context is most valuable.


Scenario: Customer Support Agent


Question 46 (Scenario: Customer Support Agent)

Situation: While testing, you notice the agent often calls get_customer when users ask about order status, even though lookup_order would be more appropriate. What should you check first to address this problem?

What should you check first?

Why D: Tool descriptions are the primary input the model uses to decide which tool to call. When an agent consistently picks the wrong tool, the first diagnostic step is to verify that tool descriptions clearly separate each tool’s purpose and usage boundaries.


Question 47 (Scenario: Customer Support Agent)

Situation: Your agent handles single-issue requests with 94% accuracy (e.g., “I need a refund for order #1234”). But when customers include multiple issues in one message (e.g., “I need a refund for order #1234 and also want to update the shipping address for order #5678”), tool selection accuracy drops to 58%. The agent usually solves only one issue or mixes parameters across requests. What approach most effectively improves reliability for multi-issue requests?

What approach is most effective?

Why C: Few-shot examples that demonstrate correct reasoning and tool sequencing for multi-issue requests are most effective because the agent already performs well on single issues—what it needs is guidance on the pattern for decomposing and routing multiple issues and keeping parameters separated.


Question 48 (Scenario: Customer Support Agent)

Situation: Production logs show that for simple requests like “refund for order #1234,” your agent resolves the issue in 3–4 tool calls with 91% success. But for complex requests like “I was billed twice, my discount didn’t apply, and I want to cancel,” the agent averages 12+ tool calls with only 54% success—often investigating issues sequentially and fetching redundant customer data for each. What change most effectively improves handling of complex requests?

What change is most effective?

Why C: Decomposing into separate issues and investigating in parallel with shared customer context fixes both key problems: it eliminates redundant data retrieval by reusing shared context across issues and reduces total tool-call loops by parallelizing investigation before synthesizing a single resolution.


Question 49 (Scenario: Customer Support Agent)

Situation: Your agent achieves 55% first-contact resolution, well below the 80% target. Logs show it escalates simple cases (standard replacements for damaged goods with photo proof) while trying to handle complex situations requiring policy exceptions autonomously. What is the most effective way to improve escalation calibration?

What is the most effective way to improve escalation calibration?

Why C: Explicit escalation criteria with few-shot examples directly address the root cause—unclear decision boundaries between simple and complex cases. It’s the most proportional, effective first intervention that teaches the agent when to escalate and when to resolve autonomously without extra infrastructure.


Question 50 (Scenario: Customer Support Agent)

Situation: After calling get_customer and lookup_order, the agent has all available system data but still faces uncertainty. Which situation is the most justified trigger for calling escalate_to_human?

Which situation is most justified for escalation?

Why C: This is a genuine policy gap: company rules cover price drops on your own site but do not address competitor price matching. The agent must not invent policy and should escalate for human judgment on how to interpret or extend existing rules.


Question 51 (Scenario: Customer Support Agent)

Situation: Production logs show that in 12% of cases your agent skips get_customer and calls lookup_order directly using only the customer-provided name, sometimes leading to misidentified accounts and incorrect refunds. What change most effectively fixes this reliability problem?

What change is most effective?

Why C: A programmatic precondition provides a deterministic guarantee that required sequencing is followed. It’s the most effective approach because it eliminates the possibility of skipping verification, regardless of LLM behavior.


Question 52 (Scenario: Customer Support Agent)

Situation: Production metrics show that when resolving complex billing disputes or multi-order returns, customer satisfaction scores are 15% lower than for simple cases—even when the resolution is technically correct. Root-cause analysis shows the agent provides accurate solutions but inconsistently explains rationale: sometimes omitting relevant policy details, sometimes missing timeline info or next steps. The specific context gaps vary case by case. You want to improve solution quality without adding human oversight. What approach is most effective?

What approach is most effective?

Why A: A self-critique stage (the evaluator-optimizer pattern) directly addresses inconsistent explanation completeness by forcing the agent to assess its own draft against concrete criteria—such as policy context, timelines, and next steps—before presenting it. This catches case-specific gaps without human oversight.


Question 53 (Scenario: Customer Support Agent)

Situation: Production metrics show your agent averages 4+ API loops per resolution. Analysis reveals Claude often requests get_customer and lookup_order in separate sequential turns even when both are needed initially. What is the most effective way to reduce the number of loops?

What is the most effective way to reduce loops?

Why D: Prompting Claude to bundle related tool requests into a single turn leverages its native ability to request multiple tools at once. It directly fixes the sequential-call pattern with minimal architectural change.


Question 54 (Scenario: Customer Support Agent)

Situation: Production logs show a pattern: customers reference specific amounts (e.g., “the 15% discount I mentioned”), but the agent responds with incorrect values. Investigation shows these details were mentioned 20+ turns ago and condensed into vague summaries like “promotional pricing was discussed.” What fix is most effective?

What fix is most effective?

Why C: Summarization inherently loses precise details. Extracting transactional facts into a structured “case facts” block outside the summarized history preserves critical information so it’s reliably available in every prompt regardless of how many turns have been summarized.


Question 55 (Scenario: Customer Support Agent)

Situation: Your get_customer tool returns all matches when searching by name. Currently, when there are multiple results, Claude picks the customer with the most recent order, but production data shows this selects the wrong account 15% of the time for ambiguous matches. How should you address this?

How should you address this?

Why B: Asking the user for an additional identifier is the most reliable way to resolve ambiguity because the user has definitive knowledge of their identity. One extra conversational turn is a small price to pay to eliminate a 15% error rate caused by choosing the wrong account.


Question 56 (Scenario: Customer Support Agent)

Situation: Production logs show a consistent pattern: when customers include the word “account” in their message (e.g., “I want to check my account for an order I made yesterday”), the agent calls get_customer first 78% of the time. When customers phrase similar requests without “account” (e.g., “I want to check an order I made yesterday”), it calls lookup_order first 93% of the time. Tool descriptions are clear and unambiguous. What is the most likely root cause of this discrepancy?

What is the most likely root cause?

Why A: The systematic keyword-driven pattern (78% vs 93%) strongly indicates explicit routing logic in the system prompt reacting to the word “account” and steering the agent toward customer-related tools. Since tool descriptions are already clear, the discrepancy points to prompt-level instructions creating unintended behavioral steering.


Question 57 (Scenario: Customer Support Agent)

Situation: Production logs show the agent often calls get_customer when users ask about orders (e.g., “check my order #12345”) instead of calling lookup_order. Both tools have minimal descriptions (“Gets customer information” / “Gets order details”) and accept similar-looking identifier formats. What is the most effective first step to improve tool selection reliability?

What is the most effective first step?

Why D: Expanding tool descriptions with input formats, example queries, edge cases, and clear boundaries directly fixes the root cause—minimal descriptions that don’t give the LLM enough information to distinguish similar tools. It’s a low-effort, high-impact first step that improves the primary mechanism the LLM uses for tool selection.


Question 58 (Scenario: Customer Support Agent)

Situation: You are implementing the agent loop for your support agent. After each Claude API call, you must decide whether to continue the loop (run requested tools and call Claude again) or stop (present the final answer to the customer). What determines this decision?

What determines this decision?

Why A: stop_reason is Claude’s explicit structured signal for loop control: tool_use indicates Claude wants to run a tool and receive results back, while end_turn indicates Claude has completed its response and the loop should end.


Question 59 (Scenario: Customer Support Agent)

Situation: Production logs show the agent misinterprets outputs from your MCP tools: Unix timestamps from get_customer, ISO 8601 dates from lookup_order, and numeric status codes (1=pending, 2=shipped). Some tools are third-party MCP servers you cannot modify. Which approach to data format normalization is most maintainable?

Which approach is most maintainable?

Why A: A PostToolUse hook provides a centralized, deterministic point to intercept and normalize all tool outputs—including third-party MCP server data—before the agent processes them. It’s more maintainable because transformations live in code and apply uniformly, rather than relying on LLM interpretation.


Question 60 (Scenario: Customer Support Agent)

Situation: Production logs show the agent sometimes chooses get_customer when lookup_order would be more appropriate, especially for ambiguous queries like “I need help with my recent purchase.” You decide to add few-shot examples to the system prompt to improve tool selection. Which approach most effectively addresses the problem?

Which approach is most effective?

Why C: Targeting few-shot examples at the specific ambiguous scenarios where errors occur, with explicit rationale for why one tool is preferable to alternatives, teaches the model the comparative decision process needed for edge cases. This is more effective than generic examples or declarative rules.


Practical Exercises

Exercise 1: Multi-tool Agent with Escalation Logic

Goal: Design an agent loop with tool integration, structured error handling, and escalation.

Steps:

  1. Define 3–4 MCP tools with detailed descriptions (include two similar tools to test tool selection)
  2. Implement an agent loop checking stop_reason ("tool_use" / "end_turn")
  3. Add structured error responses: errorCategory, isRetryable, description
  4. Implement an interceptor hook that blocks operations above a threshold and routes to escalation
  5. Test with multi-aspect requests

Domains: 1 (Agent architecture), 2 (Tools and MCP), 5 (Context and reliability)


Exercise 2: Configuring Claude Code for Team Development

Goal: Configure CLAUDE.md, custom commands, path-specific rules, and MCP servers.

Steps:

  1. Create a project-level CLAUDE.md with universal standards
  2. Create .claude/rules/ files with YAML frontmatter for different code areas (paths: ["src/api/**/*"], paths: ["**/*.test.*"])
  3. Create a project skill under .claude/skills/ with context: fork and allowed-tools
  4. Configure an MCP server in .mcp.json with environment variables + a personal override in ~/.claude.json
  5. Test planning mode vs direct execution on tasks of different complexity

Domains: 3 (Claude Code configuration), 2 (Tools and MCP)


Exercise 3: Structured Data Extraction Pipeline

Goal: JSON schemas, tool_use for structured output, validation/retry loops, batch processing.

Steps:

  1. Define an extraction tool with a JSON schema (required/optional fields, enums with "other", nullable fields)
  2. Build a validation loop: on error, retry with the document, the incorrect extraction, and the specific validation error
  3. Add few-shot examples for documents with different structures
  4. Use batch processing via the Message Batches API: 100 documents, handle failures via custom_id
  5. Route to humans: field-level confidence scores, document-type analysis

Domains: 4 (Prompt engineering), 5 (Context and reliability)


Exercise 4: Designing and Debugging a Multi-agent Research Pipeline

Goal: Subagent orchestration, context passing, error propagation, synthesis with source tracking.

Steps:

  1. A coordinator with 2+ subagents (allowedTools includes "Task", context is passed explicitly in prompts)
  2. Run subagents in parallel via multiple Task calls in a single response
  3. Require structured subagent output: claim, quote, source URL, publication date
  4. Simulate a subagent timeout: return structured error context to the coordinator and continue with partial results
  5. Test with conflicting data: preserve both values with attribution; separate confirmed vs disputed findings

Domains: 1 (Agent architecture), 2 (Tools and MCP), 5 (Context and reliability)


Appendix: Technologies and Concepts

Technology Key aspects
Claude Agent SDK AgentDefinition, agent loops, stop_reason, hooks (PostToolUse), spawning subagents via Task, allowedTools
Model Context Protocol (MCP) MCP servers, tools, resources, isError, tool descriptions, .mcp.json, environment variables
Claude Code CLAUDE.md hierarchy, .claude/rules/ with glob patterns, .claude/commands/, .claude/skills/ with SKILL.md, planning mode, /compact, --resume, fork_session
Claude Code CLI -p / --print for non-interactive mode, --output-format json, --json-schema
Claude API tool_use with JSON schemas, tool_choice ("auto"/"any"/forced), stop_reason, max_tokens, system prompts
Message Batches API 50% savings, up to 24-hour window, custom_id, no multi-turn tool calling
JSON Schema Required vs optional, nullable fields, enum types, "other" + detail, strict mode
Pydantic Schema validation, semantic errors, validation/retry loops
Built-in tools Read, Write, Edit, Bash, Grep, Glob — purpose and selection criteria
Few-shot prompting Targeted examples for ambiguous situations, generalization to new patterns
Prompt chaining Sequential decomposition into focused passes
Context window Token budgets, progressive summarization, "lost in the middle", scratchpad files
Session management Resume, fork_session, named sessions, context isolation
Confidence calibration Field-level scoring, calibration on labeled sets, stratified sampling

Out-of-Scope Topics

The following adjacent topics will NOT be on the exam:


Preparation Recommendations

  1. Build an agent with the Claude Agent SDK — implement a full agent loop with tool calling, error handling, and session management. Practice subagents and explicit context passing.

  2. Configure Claude Code for a real project — use CLAUDE.md hierarchy, path-specific rules in .claude/rules/, skills with context: fork and allowed-tools, and MCP server integration.

  3. Design and test MCP tools — write descriptions that differentiate similar tools, return structured errors with categories and retry flags, and test against ambiguous user requests.

  4. Build a data extraction pipeline — use tool_use with JSON schemas, validation/retry loops, optional/nullable fields, and batch processing via the Message Batches API.

  5. Practice prompt engineering — add few-shot examples for ambiguous scenarios, explicit review criteria, and multi-pass architectures for large code reviews.

  6. Study context management patterns — extract facts from verbose outputs, use scratchpad files, and delegate discovery to subagents to handle context limits.

  7. Understand escalation and human-in-the-loop — when to escalate (policy gaps, explicit user request, inability to make progress) and confidence-based routing workflows.

  8. Take a practice exam before the real one. It uses the same scenarios and format.