Most AI demos fail in the dressing room of production data: messy attributes, missing images, and policies that were never written down.
This one worked because the team shrank ambition to a single job — find in-stock alternatives for delayed shipments — and built evals from actual customer service transcripts.
They rehearsed failure modes in front of legal and merchandising before the executive preview. No surprises, no theatrics.
Funding followed not because the model was magic, but because the organization saw a path from pilot to operation with owners attached.
The demo worked because it refused to act like a magic show.
The team had seen enough impressive failures. One assistant could summarize products but invented delivery dates. Another wrote good copy but ignored restricted claims. A third answered confidently until someone asked where the numbers came from. The room had learned to smile politely and distrust the screen.
This time the use case was small: delayed shipments. Customer service agents needed alternatives they could offer when a product would not arrive on time. The assistant was not allowed to invent. It had to check stock, match category and price range, avoid restricted substitutions, and explain why it suggested each option.
The team built the evaluation set from real transcripts. Not cleaned-up examples. Real messages with impatient customers, missing sizes, unclear product names, and emotional language. Merchandising reviewed the suggestions. Legal reviewed the boundaries. Customer service reviewed the tone.
During rehearsal, the assistant failed in useful ways. It suggested a substitute that was technically similar but felt wrong for the occasion. It used language that sounded too cheerful for a complaint. It chose an available item that had a high return rate. Each failure became a rule, example, or escalation path.
When executives saw the demo, the most convincing part was not the answer. It was the way the team explained what would happen when the answer was not good enough.
That is why funding followed. The business did not buy a miracle. It bought a working habit: smaller scope, real examples, visible risk, and people attached to the outcome.



