Domain 4: Prompt Engineering and Structured Output (20%)
4.1 Designing Prompts with Explicit Criteria to Improve Accuracy
Key knowledge:
- Explicit criteria are more effective than vague instructions (e.g., "flag comments only when they contradict code" vs "check comment accuracy")
- Generic guidance like "be more conservative" works worse than concrete categorical criteria
- The effect of false positives on developer trust: high false-positive rates in some categories undermine trust in accurate categories
Key skills:
- Define review criteria: what to report (bugs, security) vs what to ignore (minor style)
- Temporarily disable categories with high false-positive rates
- Define explicit severity criteria with code examples for each level
4.2 Using Few-shot Prompting to Improve Output Consistency
Key knowledge:
- Few-shot examples are the most effective method for producing consistently formatted, actionable output
- Few-shot can demonstrate handling of ambiguous cases (tool selection, gaps in test coverage)
- Few-shot helps the model generalize to new patterns rather than just repeating defaults
- Few-shot can reduce hallucinations in extraction tasks
Key skills:
- Provide 2–4 targeted examples for ambiguous scenarios with rationale
- Include few-shot examples that demonstrate the output format (location, issue, severity, suggested fix)
- Provide examples that distinguish acceptable code patterns from real issues
- Provide examples of correct extraction from documents with different structures
4.3 Enforcing Structured Output with tool_use and JSON Schemas
Key knowledge:
tool_use with JSON Schemas is the most reliable way to guarantee schema-conformant output and eliminate JSON syntax errors
- With
tool_choice: "auto" the model can return text; with "any" it must call a tool; forced selection chooses a specific tool
- Strict JSON Schemas eliminate syntax errors but do not prevent semantic errors (totals don't add up; values in wrong fields)
- Schema design: required vs optional fields; enums with "other" plus a detail string for extensibility
Key skills:
- Define extraction tools with JSON Schemas and parse data from
tool_use results
- Use
tool_choice: "any" to guarantee structured output when multiple schemas exist
- Force a specific tool call:
tool_choice: {"type": "tool", "name": "extract_metadata"}
- Make fields optional/nullable when the source may not contain information to avoid fabricating values
- Use enum values like
"unclear" and "other" plus detail fields for extensible categorization
4.4 Implementing Validation, Retries, and Feedback Loops for Extraction Quality
Key knowledge:
- Retry-with-error-feedback: include concrete validation errors in the retry prompt to guide corrections
- Retries are ineffective when the information is simply absent from the source
- Feedback loop design: track the pattern that triggered a finding (
detected_pattern)
- Semantic errors (totals don't reconcile) vs syntax errors (addressed by
tool_use)
Key skills:
- Follow-up prompts with the original document, an incorrect extraction, and specific validation errors
- Identify when retry will be ineffective (the required info is only in an external document)
- Include
detected_pattern fields in findings to analyze false positives
- Design self-correction by extracting both
calculated_total and stated_total to detect discrepancies
4.5 Designing Efficient Batch Processing Strategies
Key knowledge:
- Message Batches API: 50% savings, up to 24-hour processing window, no latency SLA guarantees
- Batch processing is suitable for non-blocking tasks (overnight reports, audits) and not suitable for blocking tasks (pre-merge checks)
- Batch API does not support multi-turn tool calling within a single request
custom_id fields correlate request/response within batches
Key skills:
- Use synchronous API for blocking checks; use Batch API for overnight/weekly workloads
- Plan batch submission cadence based on SLA needs (e.g., 4-hour windows for a 30-hour guarantee with 24-hour processing)
- Handle failures by re-submitting only failed documents (identified by
custom_id)
- Iterate on prompts using a sample before running large-scale processing
4.6 Designing Multi-instance and Multi-pass Review Architectures
Key knowledge:
- Self-review limitations: the model retains its reasoning context and is less likely to challenge its own decisions
- Independent review instances (without generation context) are better at finding subtle issues
- Multi-pass review: per-file local analysis plus a cross-file integration pass to avoid attention dilution
Key skills:
- Use a second independent Claude instance to review changes without generation context
- Split multi-file reviews into per-file passes plus integration passes for cross-file dataflow analysis
- Use verification passes with self-rated confidence to route reviews in a calibrated way