Chapter 12: Preserving Provenance
12.1 The Attribution Loss Problem
When summarizing results from multiple sources, the "claim → source" link can be lost:
Bad: "The AI music market is estimated at $3.2B." (No source, no year.)
Good:
{
"claim": "The AI music market is estimated at $3.2B.",
"source_url": "https://example.com/report",
"source_name": "Global AI Music Report 2024",
"publication_date": "2024-06-15",
"confidence": 0.9
}
12.2 Handling Conflicting Data
When two sources provide different values:
{
"claim": "Share of AI-generated music on streaming platforms",
"values": [
{
"value": "12%",
"source": "Spotify Annual Report 2024",
"date": "2024-03",
"methodology": "Automated classification"
},
{
"value": "8%",
"source": "Music Industry Association Survey",
"date": "2024-07",
"methodology": "Survey of 500 labels"
}
],
"conflict_detected": true,
"possible_explanation": "Difference in methodology and time period"
}
Do not arbitrarily choose one value. Preserve both with attribution and let the coordinator decide.
12.3 Include Dates for Correct Interpretation
Without dates, temporal differences can be misinterpreted as contradictions:
Bad: "Source A says 10%, source B says 15%. Contradiction."
Good: "Source A (2023) says 10%, source B (2024) says 15%. Likely +5% growth over a year."
12.4 Render by Content Type
Don't force everything into one format:
- Financial data -> tables
- News and analysis -> prose
- Technical findings -> structured lists
- Time series -> chronological ordering