Guides9 min read25 May 2026

RAG Explained: How AI Can Read Your Own Documents

Retrieval-Augmented Generation lets AI answer questions about your own files, databases, and knowledge bases. Here's how it actually works.

RAG is not fine-tuning. Fine-tuning bakes knowledge into the model's weights — expensive, slow to update, and requires ML expertise. RAG keeps the model unchanged and retrieves knowledge at query time — cheap, fast to update, and requires only software engineering.

python

# pip install openai chromadb anthropic

import openai
import chromadb
import anthropic

openai_client = openai.OpenAI()
chroma_client = chromadb.Client()
anthropic_client = anthropic.Anthropic()

# ── STEP 1: Create vector store ───────────────────────────────────────────────
collection = chroma_client.create_collection("docs")

# ── STEP 2: Embed and store document chunks ───────────────────────────────────
documents = [
    {"id": "refund-1", "text": "Refunds are processed within 5 business days. To request a refund, email support@example.com with your order number."},
    {"id": "shipping-1", "text": "Standard shipping takes 3-5 business days. Express shipping is available for an additional $15 and delivers in 1-2 business days."},
    {"id": "returns-1", "text": "Items can be returned within 30 days of purchase. Items must be unused and in original packaging. Shipping costs are non-refundable."},
    {"id": "warranty-1", "text": "All products come with a 1-year warranty against manufacturing defects. Warranty does not cover accidental damage."},
]

def embed(text: str) -> list[float]:
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

# Index all documents
for doc in documents:
    embedding = embed(doc["text"])
    collection.add(
        ids=[doc["id"]],
        embeddings=[embedding],
        documents=[doc["text"]]
    )

print(f"Indexed {len(documents)} document chunks")

# ── STEP 3: Query function ─────────────────────────────────────────────────────
def rag_query(question: str, top_k: int = 3) -> str:
    # Embed the question
    query_embedding = embed(question)

    # Retrieve most relevant chunks
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=top_k
    )
    retrieved_chunks = results["documents"][0]

    # Build augmented prompt
    context = "\n\n".join(f"- {chunk}" for chunk in retrieved_chunks)

    # Generate answer from context
    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=500,
        system="Answer questions using ONLY the provided context. If the answer is not in the context, say 'I don't have that information.' Do not use outside knowledge.",
        messages=[{
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {question}"
        }]
    )
    return response.content[0].text

# ── Test it ────────────────────────────────────────────────────────────────────
print(rag_query("How long do refunds take?"))
print(rag_query("Can I return something after 45 days?"))
print(rag_query("What is the CEO's name?"))  # Should say: I don't have that information

Without the instruction "use ONLY the provided context," the model will blend retrieved content with its training knowledge. This produces confident-sounding answers that mix real retrieved facts with hallucinated additions. Always explicitly constrain the model to the retrieved context.

RAGembeddingsvector databasesretrievalguides

🎓Interactive Courses

Ready to go further?

Take the interactive course — daily lessons, real exercises, XP and streaks. Turn reading into lasting skills.

Daily streaksXP & levels

Start a course