The premise
Retrieval-augmented generation (RAG) is the pattern behind most useful AI assistants over private content: instead of relying on what a model learned in training, the system retrieves the relevant passages from your own documents and gives them to the model as grounding, so the answer is based on your data and can cite where it came from.
It is the right tool for a specific job: answering questions over a body of knowledge that changes and that the model was never trained on, your policies, your tickets, your contracts, your product docs. It is also widely misapplied, bolted onto problems that a plain prompt would solve, or reached for when the real need was to change the model's behavior rather than its facts.
This piece is the practical version: what RAG is, when it beats the alternatives, what makes the difference between a knowledge assistant people trust and one they quietly stop using, and how we build and operate them so they stay accurate after launch.
Grounding, not memorizing
A language model knows a lot in general and nothing about your business in particular. RAG closes that gap at query time.
When a user asks a question, a RAG system first retrieves the most relevant chunks from your indexed content, then passes them to the model alongside the question with an instruction to answer from the provided material and cite it. The model is no longer guessing from training data; it is reading your documents and summarizing them. That is what makes the answer current, specific, and checkable.
The two halves matter equally, and most failures are retrieval failures, not model failures. If the retrieval step surfaces the wrong passages, even the best model produces a confident, wrong, well-written answer. A knowledge assistant is only as good as what it retrieves, which is why the unglamorous parts (how documents are split, how they are embedded, how retrieval is scored) decide whether the thing works.

Retrieval, fine-tuning, or just a longer prompt
Reach for RAG when answers must be grounded in a corpus that is large, changing, or both, and when citing the source matters. Reach for fine-tuning when you need to change how the model behaves, its tone, format, or a narrow skill, not what facts it knows; fine-tuning teaches style and patterns, not a moving body of knowledge. Reach for a plain prompt when the knowledge is small enough to paste into context, in which case retrieval is overhead you do not need.
These are not mutually exclusive, and the strongest systems combine them: a fine-tuned or well-prompted model for behavior, RAG for facts. The error we see most is fine-tuning a model on a company's documents in the hope it will memorize them. It does not work reliably: the model blends facts together, cannot cite, and goes stale the moment a document changes. If the requirement is accurate answers from current documents, that is RAG, every time.
Getting this choice wrong is expensive in both directions. We have seen fine-tuning projects that should have been a two-week RAG build, and elaborate RAG pipelines for a knowledge base that fit in a single prompt. We name the reason for the choice before building, because the architecture is hard to walk back later.

Access control, grounding, and an eval set
Three things separate a knowledge assistant you can put in front of staff or customers from a demo. The first is access control at the retrieval layer: the assistant must only retrieve from documents the current user is allowed to see, enforced when the query runs, not filtered after the fact. An assistant that can surface a document a user should never have seen is a data breach with a friendly interface.
The second is grounding and citation: every answer points back to the passages it came from, so a user can verify it and a reviewer can audit it. Assistants that answer without citations train people to trust them blindly, which is exactly backwards for anything load-bearing. The third is measurement: a graded evaluation set that scores answer accuracy and grounding on every change, so quality regressions are caught in development rather than by a customer in production.
Knowledge also changes, so retrieval has to be maintained: re-indexing as documents update, and monitoring retrieval quality for the slow degradation that erodes trust quarter over quarter. The build is the easy part; the operating posture is what keeps it accurate a year later.

Three commitments on every knowledge-assistant build
We build the full pipeline and hand it over, with the access control and evals that make it safe to trust, then operate it until your team can.
Retrieval before generation
We invest where the failures actually are: chunking, embeddings, and retrieval scoring, measured on a real question set from your users. A good retrieval layer makes an ordinary model look excellent; a bad one makes the best model unreliable.
Access control is not optional
Permissions are enforced at query time, so the assistant can never surface a document the user is not allowed to see. This is part of the architecture from day one, not a filter added later.
Measured, then maintained
Every assistant ships with an eval set that scores accuracy and grounding on each change, plus monitoring for retrieval drift in production, and the data pipeline and eval baseline that make it maintainable after handover.
An assistant people actually trust
A year in, the assistant is still accurate, still cites its sources, and has not quietly leaked a document or drifted into confident nonsense.
The honest test of a knowledge assistant is not the launch demo; it is whether people still use it after they have caught it being wrong. Trust is lost the first time an assistant invents a policy or surfaces something it should not have, and it is almost impossible to win back. That is why grounding, citation, and access control are not polish: they are the product.
The assistants that stay trusted are the ones that were measured from the start and maintained after launch. Retrieval quality is watched, the eval set grows as new failure modes are found, and the index keeps pace with the documents. None of that is glamorous, and all of it is what separates a system that compounds in value from one that is quietly abandoned in two quarters.
Done right, a knowledge assistant turns a company's scattered, under-used knowledge into something every employee can query in plain language, with answers they can check. That is a durable advantage, and it belongs to you: the pipeline, the evals, and the code.

AI for founders
questions we get asked.
Direct answers to the questions we get asked the most. If yours isn't covered, write to the team.