Guides9 min read25 May 2026
RAG Explained: How AI Can Read Your Own Documents
Retrieval-Augmented Generation lets AI answer questions about your own files, databases, and knowledge bases. Here's how it actually works.
RAG is not fine-tuning. Fine-tuning bakes knowledge into the model's weights — expensive, slow to update, and requires ML expertise. RAG keeps the model unchanged and retrieves knowledge at query time — cheap, fast to update, and requires only software engineering.
python
# pip install openai chromadb anthropic
import openai
import chromadb
import anthropic
openai_client = openai.OpenAI()
chroma_client = chromadb.Client()
anthropic_client = anthropic.Anthropic()
# ── STEP 1: Create vector store ───────────────────────────────────────────────
collection = chroma_client.create_collection("docs")
# ── STEP 2: Embed and store document chunks ───────────────────────────────────
documents = [
{"id": "refund-1", "text": "Refunds are processed within 5 business days. To request a refund, email support@example.com with your order number."},
{"id": "shipping-1", "text": "Standard shipping takes 3-5 business days. Express shipping is available for an additional $15 and delivers in 1-2 business days."},
{"id": "returns-1", "text": "Items can be returned within 30 days of purchase. Items must be unused and in original packaging. Shipping costs are non-refundable."},
{"id": "warranty-1", "text": "All products come with a 1-year warranty against manufacturing defects. Warranty does not cover accidental damage."},
]
def embed(text: str) -> list[float]:
response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
# Index all documents
for doc in documents:
embedding = embed(doc["text"])
collection.add(
ids=[doc["id"]],
embeddings=[embedding],
documents=[doc["text"]]
)
print(f"Indexed {len(documents)} document chunks")
# ── STEP 3: Query function ─────────────────────────────────────────────────────
def rag_query(question: str, top_k: int = 3) -> str:
# Embed the question
query_embedding = embed(question)
# Retrieve most relevant chunks
results = collection.query(
query_embeddings=[query_embedding],
n_results=top_k
)
retrieved_chunks = results["documents"][0]
# Build augmented prompt
context = "\n\n".join(f"- {chunk}" for chunk in retrieved_chunks)
# Generate answer from context
response = anthropic_client.messages.create(
model="claude-haiku-4-5",
max_tokens=500,
system="Answer questions using ONLY the provided context. If the answer is not in the context, say 'I don't have that information.' Do not use outside knowledge.",
messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}]
)
return response.content[0].text
# ── Test it ────────────────────────────────────────────────────────────────────
print(rag_query("How long do refunds take?"))
print(rag_query("Can I return something after 45 days?"))
print(rag_query("What is the CEO's name?")) # Should say: I don't have that informationWithout the instruction "use ONLY the provided context," the model will blend retrieved content with its training knowledge. This produces confident-sounding answers that mix real retrieved facts with hallucinated additions. Always explicitly constrain the model to the retrieved context.
RAGembeddingsvector databasesretrievalguides
🎓Interactive Courses
Ready to go further?
Take the interactive course — daily lessons, real exercises, XP and streaks. Turn reading into lasting skills.
Daily streaksXP & levels
