Home Practical Test Docs PDF

Chapter 6: Prompt Engineering — Advanced Techniques

Documentation: Prompt Engineering | Anthropic Cookbook

6.1 Few-shot Prompting

Few-shot prompting is the inclusion of 2–4 input/output examples in a prompt to demonstrate the expected behavior.

Why few-shot is more effective than textual descriptions:

Types of few-shot examples and when to use them:

  1. Examples for ambiguous scenarios:
Request: "My order is broken"
Action: Call get_customer -> lookup_order -> check status.
Rationale: "broken" may mean a damaged item; you need order details.

Request: "Get me a manager"
Action: Immediately call escalate_to_human.
Rationale: The customer explicitly requests a human. Do not attempt to solve autonomously.
  1. Examples for output formatting:
Finding example:
{
  "location": "src/auth/login.ts:42",
  "issue": "SQL injection in the username parameter",
  "severity": "critical",
  "suggested_fix": "Use a parameterized query"
}
  1. Examples to separate acceptable vs problematic code:
// Acceptable (do not flag):
const items = data.filter(x => x.active);

// Problem (flag):
const items = data.filter(x => x.active == true); // Use strict equality ===
  1. Examples for extraction from different document formats:
Document with inline citations:
"As shown in the study (Smith, 2023), the rate is 42%."
-> {"value": "42%", "source": "Smith, 2023", "type": "inline_citation"}

Document with bibliography references:
"The rate is 42%. [1]"
-> {"value": "42%", "source": "reference_1", "type": "bibliography"}
  1. Examples for informal measurements:
Text: "about two handfuls of rice"
-> {"amount": "~100g", "original_text": "two handfuls", "precision": "approximate"}

Text: "a pinch of salt"
-> {"amount": "~1g", "original_text": "a pinch", "precision": "approximate"}

Few-shot is especially effective for extracting informal and non-standard measurement units that are too diverse for purely rule-based instructions.

Format normalization rules in prompts: When using strict JSON schemas for structured output, add normalization rules in the prompt:

Normalization:
- Dates: always ISO 8601 (YYYY-MM-DD); "yesterday" -> compute an absolute date
- Currency: numeric amount + currency code; "five bucks" -> {"amount": 5, "currency": "USD"}
- Percentages: decimal fraction; "half" -> 0.5

This prevents semantic errors where the JSON is syntactically valid but values are inconsistent.

6.2 Explicit Criteria vs Vague Instructions

Bad (vague):

Check code comments for accuracy.
Be conservative—report only high-confidence findings.

Good (explicit criteria):

Flag a comment as problematic ONLY if:
1. The comment describes behavior that CONTRADICTS the actual code behavior
2. The comment references a non-existent function or variable
3. A TODO/FIXME comment refers to a bug that has already been fixed in code

Do NOT flag:
- Comments that are merely stylistically outdated
- Comments with minor wording inaccuracies
- Missing comments (that is a separate category)

Define severity criteria with examples:

CRITICAL: Runtime failure for users
  Example: NullPointerException while processing a payment

HIGH: Security vulnerability
  Example: SQL injection, XSS, missing authorization checks

MEDIUM: Logic bug without immediate impact
  Example: Wrong sorting, off-by-one error

LOW: Code quality
  Example: Duplication, suboptimal algorithm for small data

6.3 Prompt Chaining

Prompt chaining breaks a complex task into a sequence of focused steps:

Step 1: Analyze auth.ts (local issues only)
       -> Output: list of issues in auth.ts

Step 2: Analyze database.ts (local issues only)
       -> Output: list of issues in database.ts

Step 3: Integration pass (cross-file dependencies)
       -> Output: issues at module boundaries

Why this matters:

When to use prompt chaining vs dynamic decomposition:

6.4 The "Interview" Pattern

Before implementing a solution, Claude asks clarifying questions:

Claude: "Before implementing caching for the API, a few questions:
1. Which cache invalidation strategy do you prefer—TTL or event-based?
2. Is stale data acceptable when the cache is unavailable?
3. Should caching be per-user or global?
4. What is the expected data volume to cache?"

When this is useful:

6.5 Validation and Retry-with-Feedback

When extracted data fails validation:

Step 1: Extract data from the document
Step 2: Validate (Pydantic, JSON Schema, business rules)
Step 3: If there's an error—retry with context:
  - The original document
  - The previous (incorrect) extraction
  - The specific error: "Field 'total' = 150, but sum(line_items) = 145. Re-check values."

When retry will be effective:

When retry will NOT help:

Pydantic as a validation tool: Pydantic is a Python library for schema-based data validation. For the exam, the key points are:

6.6 Self-correction

A pattern for detecting internal contradictions:

{
  "stated_total": "$150.00",
  "calculated_total": "$145.00",
  "conflict_detected": true,
  "line_items": [
    {"name": "Widget A", "price": 75.00},
    {"name": "Widget B", "price": 70.00}
  ]
}

The model extracts both the stated value and a computed value—if they differ, conflict_detected allows you to handle the discrepancy.