RAG looks easy: ingest documents, embed, retrieve, prompt. Demo works in a week. Six months later, the system is wrong often enough that no one trusts it for anything that matters.
The trap is treating retrieval as a search problem instead of an evaluation problem. Without a golden dataset, evaluation harnesses, and a feedback loop, you can't tell whether changes are improving or regressing the system.
Successful RAG programs invest in evaluation infrastructure before they invest in retrieval cleverness. They measure faithfulness, citation accuracy, and recall against curated test sets — and they ship slow improvements continuously.