Domain 4: Prompt Engineering and Structured Output (20%)

4.1 Designing Prompts with Explicit Criteria to Improve Accuracy

Key knowledge:

Explicit criteria are more effective than vague instructions (e.g., "flag comments only when they contradict code" vs "check comment accuracy")
Generic guidance like "be more conservative" works worse than concrete categorical criteria
The effect of false positives on developer trust: high false-positive rates in some categories undermine trust in accurate categories

Key skills:

Define review criteria: what to report (bugs, security) vs what to ignore (minor style)
Temporarily disable categories with high false-positive rates
Define explicit severity criteria with code examples for each level

4.2 Using Few-shot Prompting to Improve Output Consistency

Key knowledge:

Few-shot examples are the most effective method for producing consistently formatted, actionable output
Few-shot can demonstrate handling of ambiguous cases (tool selection, gaps in test coverage)
Few-shot helps the model generalize to new patterns rather than just repeating defaults
Few-shot can reduce hallucinations in extraction tasks

Key skills:

Provide 2–4 targeted examples for ambiguous scenarios with rationale
Include few-shot examples that demonstrate the output format (location, issue, severity, suggested fix)
Provide examples that distinguish acceptable code patterns from real issues
Provide examples of correct extraction from documents with different structures

4.3 Enforcing Structured Output with `tool_use` and JSON Schemas

Key knowledge:

tool_use with JSON Schemas is the most reliable way to guarantee schema-conformant output and eliminate JSON syntax errors
With tool_choice: "auto" the model can return text; with "any" it must call a tool; forced selection chooses a specific tool
Strict JSON Schemas eliminate syntax errors but do not prevent semantic errors (totals don't add up; values in wrong fields)
Schema design: required vs optional fields; enums with "other" plus a detail string for extensibility

Key skills:

Define extraction tools with JSON Schemas and parse data from tool_use results
Use tool_choice: "any" to guarantee structured output when multiple schemas exist
Force a specific tool call: tool_choice: {"type": "tool", "name": "extract_metadata"}
Make fields optional/nullable when the source may not contain information to avoid fabricating values
Use enum values like "unclear" and "other" plus detail fields for extensible categorization

4.4 Implementing Validation, Retries, and Feedback Loops for Extraction Quality

Key knowledge:

Retry-with-error-feedback: include concrete validation errors in the retry prompt to guide corrections
Retries are ineffective when the information is simply absent from the source
Feedback loop design: track the pattern that triggered a finding (detected_pattern)
Semantic errors (totals don't reconcile) vs syntax errors (addressed by tool_use)

Key skills:

Follow-up prompts with the original document, an incorrect extraction, and specific validation errors
Identify when retry will be ineffective (the required info is only in an external document)
Include detected_pattern fields in findings to analyze false positives
Design self-correction by extracting both calculated_total and stated_total to detect discrepancies

4.5 Designing Efficient Batch Processing Strategies

Key knowledge:

Message Batches API: 50% savings, up to 24-hour processing window, no latency SLA guarantees
Batch processing is suitable for non-blocking tasks (overnight reports, audits) and not suitable for blocking tasks (pre-merge checks)
Batch API does not support multi-turn tool calling within a single request
custom_id fields correlate request/response within batches

Key skills:

Use synchronous API for blocking checks; use Batch API for overnight/weekly workloads
Plan batch submission cadence based on SLA needs (e.g., 4-hour windows for a 30-hour guarantee with 24-hour processing)
Handle failures by re-submitting only failed documents (identified by custom_id)
Iterate on prompts using a sample before running large-scale processing

4.6 Designing Multi-instance and Multi-pass Review Architectures

Key knowledge:

Self-review limitations: the model retains its reasoning context and is less likely to challenge its own decisions
Independent review instances (without generation context) are better at finding subtle issues
Multi-pass review: per-file local analysis plus a cross-file integration pass to avoid attention dilution

Key skills:

Use a second independent Claude instance to review changes without generation context
Split multi-file reviews into per-file passes plus integration passes for cross-file dataflow analysis
Use verification passes with self-rated confidence to route reviews in a calibrated way

Domain 4: Prompt Engineering and Structured Output (20%)

4.1 Designing Prompts with Explicit Criteria to Improve Accuracy

Key knowledge:

Key skills:

4.2 Using Few-shot Prompting to Improve Output Consistency

Key knowledge:

Key skills:

4.3 Enforcing Structured Output with tool_use and JSON Schemas

Key knowledge:

Key skills:

4.4 Implementing Validation, Retries, and Feedback Loops for Extraction Quality

Key knowledge:

Key skills:

4.5 Designing Efficient Batch Processing Strategies

Key knowledge:

Key skills:

4.6 Designing Multi-instance and Multi-pass Review Architectures

Key knowledge:

Key skills:

4.3 Enforcing Structured Output with `tool_use` and JSON Schemas