Every brand team that tries to make GPT-4 or Claude 'sound like us' eventually hits the same wall: prompts get longer, outputs get blander, and nobody is sure which example in the system prompt is doing the work.
RAG, retrieval-augmented generation, solves the wrong problem if you treat it as a search engine. Treat it as a brand-voice corpus instead, and the picture changes. Index 200 of your highest-quality past assets (long-form, social, lifecycle, scripts), retrieve the closest 6 by semantic similarity, and inject them as in-context exemplars before the model writes anything.
The chunking strategy matters more than the embedding model. We chunk by rhetorical unit (headline, subhead, opening paragraph, CTA) rather than by token count. The retrieval is then filtered by asset type, so a paid social brief retrieves paid social exemplars, not a 2,000-word blog intro.
Three teams running this in production report a 40–60% reduction in editor passes per asset. The corpus pays for itself in roughly six weeks of saved review time.
The full chunking schema, the embedding model we landed on, and the retrieval prompt template are in the appendix.
"You don't need a fine-tune. You need a librarian the model can call before it writes the first sentence."
