Chapter 6: Prompt Engineering — Advanced Techniques

Documentation: Prompt Engineering | Anthropic Cookbook

6.1 Few-shot Prompting

Few-shot prompting is the inclusion of 2–4 input/output examples in a prompt to demonstrate the expected behavior.

Why few-shot is more effective than textual descriptions:

A vague instruction like "be more precise" can be interpreted in many ways
An example unambiguously shows the expected format and decision logic
The model generalizes the pattern to new cases (it does not just repeat the examples)

Types of few-shot examples and when to use them:

Examples for ambiguous scenarios:

Request: "My order is broken"
Action: Call get_customer -> lookup_order -> check status.
Rationale: "broken" may mean a damaged item; you need order details.

Request: "Get me a manager"
Action: Immediately call escalate_to_human.
Rationale: The customer explicitly requests a human. Do not attempt to solve autonomously.

Examples for output formatting:

Finding example:
{
  "location": "src/auth/login.ts:42",
  "issue": "SQL injection in the username parameter",
  "severity": "critical",
  "suggested_fix": "Use a parameterized query"
}

Examples to separate acceptable vs problematic code:

// Acceptable (do not flag):
const items = data.filter(x => x.active);

// Problem (flag):
const items = data.filter(x => x.active == true); // Use strict equality ===

Examples for extraction from different document formats:

Document with inline citations:
"As shown in the study (Smith, 2023), the rate is 42%."
-> {"value": "42%", "source": "Smith, 2023", "type": "inline_citation"}

Document with bibliography references:
"The rate is 42%. [1]"
-> {"value": "42%", "source": "reference_1", "type": "bibliography"}

Examples for informal measurements:

Text: "about two handfuls of rice"
-> {"amount": "~100g", "original_text": "two handfuls", "precision": "approximate"}

Text: "a pinch of salt"
-> {"amount": "~1g", "original_text": "a pinch", "precision": "approximate"}

Few-shot is especially effective for extracting informal and non-standard measurement units that are too diverse for purely rule-based instructions.

Format normalization rules in prompts: When using strict JSON schemas for structured output, add normalization rules in the prompt:

Normalization:
- Dates: always ISO 8601 (YYYY-MM-DD); "yesterday" -> compute an absolute date
- Currency: numeric amount + currency code; "five bucks" -> {"amount": 5, "currency": "USD"}
- Percentages: decimal fraction; "half" -> 0.5

This prevents semantic errors where the JSON is syntactically valid but values are inconsistent.

6.2 Explicit Criteria vs Vague Instructions

Bad (vague):

Check code comments for accuracy.
Be conservative—report only high-confidence findings.

Good (explicit criteria):

Flag a comment as problematic ONLY if:
1. The comment describes behavior that CONTRADICTS the actual code behavior
2. The comment references a non-existent function or variable
3. A TODO/FIXME comment refers to a bug that has already been fixed in code

Do NOT flag:
- Comments that are merely stylistically outdated
- Comments with minor wording inaccuracies
- Missing comments (that is a separate category)

Define severity criteria with examples:

CRITICAL: Runtime failure for users
  Example: NullPointerException while processing a payment

HIGH: Security vulnerability
  Example: SQL injection, XSS, missing authorization checks

MEDIUM: Logic bug without immediate impact
  Example: Wrong sorting, off-by-one error

LOW: Code quality
  Example: Duplication, suboptimal algorithm for small data

6.3 Prompt Chaining

Prompt chaining breaks a complex task into a sequence of focused steps:

Step 1: Analyze auth.ts (local issues only)
       -> Output: list of issues in auth.ts

Step 2: Analyze database.ts (local issues only)
       -> Output: list of issues in database.ts

Step 3: Integration pass (cross-file dependencies)
       -> Output: issues at module boundaries

Why this matters:

Avoids attention dilution—when the model receives too many files at once, it may miss bugs in some files while providing shallow commentary on others
Ensures consistent analysis quality per file
Allows separate analysis of cross-file interactions

When to use prompt chaining vs dynamic decomposition:

Prompt chaining — predictable, repeatable tasks (code review, file migrations)
Dynamic decomposition — open-ended investigations where subtasks become clear only during execution

6.4 The "Interview" Pattern

Before implementing a solution, Claude asks clarifying questions:

Claude: "Before implementing caching for the API, a few questions:
1. Which cache invalidation strategy do you prefer—TTL or event-based?
2. Is stale data acceptable when the cache is unavailable?
3. Should caching be per-user or global?
4. What is the expected data volume to cache?"

When this is useful:

Unfamiliar domain (fintech, healthcare, legal systems)
Tasks with non-obvious implications (cache strategies, failure modes)
Multiple viable approaches where the best choice depends on context

6.5 Validation and Retry-with-Feedback

When extracted data fails validation:

Step 1: Extract data from the document
Step 2: Validate (Pydantic, JSON Schema, business rules)
Step 3: If there's an error—retry with context:
  - The original document
  - The previous (incorrect) extraction
  - The specific error: "Field 'total' = 150, but sum(line_items) = 145. Re-check values."

When retry will be effective:

Format errors (date in the wrong format)
Structural errors (a field placed in the wrong location)
Arithmetic inconsistencies (the model can re-check)

When retry will NOT help:

The information is absent from the source document
The required context is external (the data is in another document not provided)

Pydantic as a validation tool: Pydantic is a Python library for schema-based data validation. For the exam, the key points are:

Structural validation: types, requiredness, enum constraints checked in code after receiving JSON from Claude
Semantic validation: custom validators enforce business logic (sum of items equals total; start_date < end_date)
Validate–retry loops: on Pydantic validation failure, construct an error message and re-prompt Claude with the error context
JSON Schema generation: Pydantic models can generate JSON Schema for tool_use, providing a single source of truth

6.6 Self-correction

A pattern for detecting internal contradictions:

{
  "stated_total": "$150.00",
  "calculated_total": "$145.00",
  "conflict_detected": true,
  "line_items": [
    {"name": "Widget A", "price": 75.00},
    {"name": "Widget B", "price": 70.00}
  ]
}

The model extracts both the stated value and a computed value—if they differ, conflict_detected allows you to handle the discrepancy.