Home Practical Test Docs PDF

Domain 5: Context Management and Reliability (15%)

5.1 Managing Conversation Context to Preserve Critical Information

Key knowledge:

Risks of progressive summarization: numeric values, percentages, and dates get condensed into vague summaries
Lost-in-the-middle effect: models reliably process the start and end of long inputs, but may miss findings from the middle
Tool outputs can accumulate in context disproportionately to relevance (40+ fields when 5 are needed)
The importance of sending the full conversation history in subsequent API requests

Key skills:

Extract transactional facts into a persistent "case facts" block outside the summarized history
Trim verbose tool outputs down to relevant fields
Place key findings at the beginning of aggregated data with explicit section headings
Require subagents to include metadata (dates, sources) in structured outputs

5.2 Designing Effective Escalation Patterns and Resolving Ambiguity

Key knowledge:

Suitable escalation triggers: explicit request for a human, policy gaps/exceptions, inability to make progress
Immediate escalation (explicit request) vs attempt-to-resolve (within agent scope)
Sentiment analysis and model confidence self-ratings are unreliable proxies for case complexity
Multiple customer matches require asking for additional identifiers, not heuristic guessing

Key skills:

Explicit escalation criteria with few-shot examples in the system prompt
Execute explicit requests for a human immediately without additional investigation
Escalate when policy is ambiguous or silent for a specific request
Ask for additional identifiers when tool results contain multiple matches

5.3 Implementing Error Propagation Strategies in Multi-agent Systems

Key knowledge:

Structured error context (failure type, query, partial results, alternatives) enables smarter coordinator recovery
Distinguish access failures (timeouts require a retry decision) from valid empty results (no matches)
Generic error statuses ("search unavailable") hide valuable context from the coordinator
Silent suppression or aborting the whole workflow on a single failure are both anti-patterns

Key skills:

Return structured error context: failure type, what was attempted, partial results, possible alternatives
Distinguish access failures from valid empty results
Perform local recovery in subagents for transient failures; propagate only non-recoverable errors with partial results
Annotate coverage in synthesis: what is well-supported vs where gaps remain

5.4 Managing Context Efficiently When Investigating Large Codebases

Key knowledge:

Context degradation in long sessions: the model starts producing unstable answers and referring to "typical patterns" instead of specific classes
Scratchpad files preserve key findings across context boundaries
Delegating to subagents isolates verbose discovery output
Structured state persistence enables crash recovery

Key skills:

Spawn subagents for specific questions while keeping high-level coordination in the main agent
Use scratchpad files to store key findings and reference them later
Summarize key findings before spawning next-phase subagents
Use /compact to reduce context usage during long investigations

5.5 Designing Workflows with Human Oversight and Confidence Calibration

Key knowledge:

Aggregate metrics (e.g., 97% overall accuracy) can mask poor performance on specific document types or fields
Stratified random sampling measures error rates in high-confidence extractions
Field-level confidence calibration using labeled validation sets
Validate accuracy by document type and field segment before automating

Key skills:

Implement stratified random sampling to detect new error patterns
Analyze accuracy by document type and field to validate stable performance
Output field-level confidence scores and calibrate review thresholds using labeled data
Route low-confidence or ambiguous-source extractions to human review

5.6 Preserving Provenance and Handling Uncertainty in Multi-source Synthesis

Key knowledge:

Attribution is lost during summarization without preserving "claim → source" mappings
Structured mappings must be preserved during aggregation
Handle conflicting statistics by annotating conflicts with attribution rather than arbitrarily choosing one value
Include publication/collection dates to avoid misreading temporal differences as contradictions

Key skills:

Require subagents to output "claim → source" mappings (URL, document name, quotes)
Structure reports to separate stable findings from disputed ones
Preserve conflicting values with annotations and pass them to the coordinator for reconciliation
Include publication dates for correct temporal interpretation
Render content by type: financial data as tables, news as prose, technical findings as structured lists