What is RAG in simple terms?

Retrieval-augmented generation. When you ask a question, the system finds the most relevant passages in your own documents and gives them to the AI model as context, so the answer is based on your data rather than the model's training, and it can cite the source. It is the standard pattern for AI assistants over private or frequently changing content.

Should we use RAG or fine-tune a model?

RAG when answers must be grounded in current facts and cite a source. Fine-tuning when you need to change the model's behavior, tone, or format, not its knowledge. They solve different problems and are often combined: a well-behaved model for style, RAG for facts. If the goal is accurate answers from your documents, the answer is RAG.

How do you stop it from leaking documents people should not see?

Access control is enforced at the retrieval layer: when a user asks a question, the system only retrieves from documents that user is permitted to see, at query time. It is built into the architecture from day one rather than added as a filter, because a knowledge assistant that can surface the wrong document is a data breach.

How do you know the answers are accurate?

A graded evaluation set scores answer accuracy and grounding on every change during development, and monitoring watches retrieval quality in production so degradation is caught before users notice. Every answer also cites its sources, so individual answers can be verified by the person reading them.

How long does a RAG system take to build?

A focused knowledge assistant over a defined corpus is typically a few weeks to a working, measured system, depending on the state of the data. Messy or scattered source documents add time at the data-preparation stage, which is usually where the real work is.

RAG for business: building knowledge assistants that actually work

The premise

Retrieval-augmented generation (RAG) is the pattern behind most useful AI assistants over private content: instead of relying on what a model learned in training, the system retrieves the relevant passages from your own documents and gives them to the model as grounding, so the answer is based on your data and can cite where it came from.

It is the right tool for a specific job: answering questions over a body of knowledge that changes and that the model was never trained on, your policies, your tickets, your contracts, your product docs. It is also widely misapplied, bolted onto problems that a plain prompt would solve, or reached for when the real need was to change the model's behavior rather than its facts.

This piece is the practical version: what RAG is, when it beats the alternatives, what makes the difference between a knowledge assistant people trust and one they quietly stop using, and how we build and operate them so they stay accurate after launch.

What it is

Grounding, not memorizing

A language model knows a lot in general and nothing about your business in particular. RAG closes that gap at query time.

When a user asks a question, a RAG system first retrieves the most relevant chunks from your indexed content, then passes them to the model alongside the question with an instruction to answer from the provided material and cite it. The model is no longer guessing from training data; it is reading your documents and summarizing them. That is what makes the answer current, specific, and checkable.

The two halves matter equally, and most failures are retrieval failures, not model failures. If the retrieval step surfaces the wrong passages, even the best model produces a confident, wrong, well-written answer. A knowledge assistant is only as good as what it retrieves, which is why the unglamorous parts (how documents are split, how they are embedded, how retrieval is scored) decide whether the thing works.

RAG vs the alternatives

Retrieval, fine-tuning, or just a longer prompt

Reach for RAG when answers must be grounded in a corpus that is large, changing, or both, and when citing the source matters. Reach for fine-tuning when you need to change how the model behaves, its tone, format, or a narrow skill, not what facts it knows; fine-tuning teaches style and patterns, not a moving body of knowledge. Reach for a plain prompt when the knowledge is small enough to paste into context, in which case retrieval is overhead you do not need.

These are not mutually exclusive, and the strongest systems combine them: a fine-tuned or well-prompted model for behavior, RAG for facts. The error we see most is fine-tuning a model on a company's documents in the hope it will memorize them. It does not work reliably: the model blends facts together, cannot cite, and goes stale the moment a document changes. If the requirement is accurate answers from current documents, that is RAG, every time.

Getting this choice wrong is expensive in both directions. We have seen fine-tuning projects that should have been a two-week RAG build, and elaborate RAG pipelines for a knowledge base that fit in a single prompt. We name the reason for the choice before building, because the architecture is hard to walk back later.

Fig. · Retrieval, fine-tuning, or just a longer prompt

What makes it trustworthy

Access control, grounding, and an eval set

Three things separate a knowledge assistant you can put in front of staff or customers from a demo. The first is access control at the retrieval layer: the assistant must only retrieve from documents the current user is allowed to see, enforced when the query runs, not filtered after the fact. An assistant that can surface a document a user should never have seen is a data breach with a friendly interface.

The second is grounding and citation: every answer points back to the passages it came from, so a user can verify it and a reviewer can audit it. Assistants that answer without citations train people to trust them blindly, which is exactly backwards for anything load-bearing. The third is measurement: a graded evaluation set that scores answer accuracy and grounding on every change, so quality regressions are caught in development rather than by a customer in production.

Knowledge also changes, so retrieval has to be maintained: re-indexing as documents update, and monitoring retrieval quality for the slow degradation that erodes trust quarter over quarter. The build is the easy part; the operating posture is what keeps it accurate a year later.

Fig. · Access control, grounding, and an eval set

How SDEN builds RAG

Three commitments on every knowledge-assistant build

We build the full pipeline and hand it over, with the access control and evals that make it safe to trust, then operate it until your team can.

Retrieval before generation

We invest where the failures actually are: chunking, embeddings, and retrieval scoring, measured on a real question set from your users. A good retrieval layer makes an ordinary model look excellent; a bad one makes the best model unreliable.

Access control is not optional

Permissions are enforced at query time, so the assistant can never surface a document the user is not allowed to see. This is part of the architecture from day one, not a filter added later.

Measured, then maintained

Every assistant ships with an eval set that scores accuracy and grounding on each change, plus monitoring for retrieval drift in production, and the data pipeline and eval baseline that make it maintainable after handover.

What good looks like

An assistant people actually trust

A year in, the assistant is still accurate, still cites its sources, and has not quietly leaked a document or drifted into confident nonsense.

The honest test of a knowledge assistant is not the launch demo; it is whether people still use it after they have caught it being wrong. Trust is lost the first time an assistant invents a policy or surfaces something it should not have, and it is almost impossible to win back. That is why grounding, citation, and access control are not polish: they are the product.

The assistants that stay trusted are the ones that were measured from the start and maintained after launch. Retrieval quality is watched, the eval set grows as new failure modes are found, and the index keeps pace with the documents. None of that is glamorous, and all of it is what separates a system that compounds in value from one that is quietly abandoned in two quarters.

Done right, a knowledge assistant turns a company's scattered, under-used knowledge into something every employee can query in plain language, with answers they can check. That is a durable advantage, and it belongs to you: the pipeline, the evals, and the code.

Fig. · An assistant people actually trust

FAQ

AI for founders
questions we get asked.

Direct answers to the questions we get asked the most. If yours isn't covered, write to the team.

Contact the team

RAG for business: building knowledge assistants that actually work

Grounding, not memorizing

Retrieval, fine-tuning, or just a longer prompt

Access control, grounding, and an eval set

Three commitments on every knowledge-assistant build

Retrieval before generation

Access control is not optional

Measured, then maintained

An assistant people actually trust

AI for founders
questions we get asked.

Ready to build and own your AI?

Related on SDEN

Custom AI workflows vs off-the-shelf tools: when each one wins

From ChatGPT pilot to production AI: the engineering steps founders skip

AI & Machine Learning expertise

Build & Run

Grounding, not memorizing

Retrieval, fine-tuning, or just a longer prompt

Access control, grounding, and an eval set

Three commitments on every knowledge-assistant build

Retrieval before generation

Access control is not optional

Measured, then maintained

An assistant people actually trust

AI for foundersquestions we get asked.

What is RAG in simple terms?

Should we use RAG or fine-tune a model?

How do you stop it from leaking documents people should not see?

How do you know the answers are accurate?

How long does a RAG system take to build?

Ready to build and own your AI?

Related on SDEN

Custom AI workflows vs off-the-shelf tools: when each one wins

From ChatGPT pilot to production AI: the engineering steps founders skip

AI & Machine Learning expertise

Build & Run

AI for founders
questions we get asked.