Guides9 min read25 May 2026

Context Windows Explained (And How to Stop Hitting the Limit)

Every AI model has a context window — a cap on how much it can see at once. Here's what that means in practice and how to work around it.

The context window includes BOTH your input AND the model's output. If you have a 128k token window and your input uses 120k tokens, the model only has 8k tokens available to generate a response.

A large context window does not mean the model uses all of it well. Research shows attention quality degrades in the middle of very long contexts — a phenomenon called "lost in the middle." For tasks requiring precise recall, shorter, focused contexts outperform stuffing everything into a 1M token window.

Silent truncation is the most dangerous failure mode. The model does not tell you it cannot see the beginning of your document. It just answers as if that content does not exist — and its answer looks completely normal.

python

import tiktoken

def count_tokens(text: str, model: str = "gpt-4o") -> int:
    """Estimate token count for a given text string."""
    enc = tiktoken.encoding_for_model(model)
    return len(enc.encode(text))

def will_fit(text: str, max_tokens: int = 100_000, model: str = "gpt-4o") -> bool:
    """Check if text fits within a token budget."""
    count = count_tokens(text, model)
    print(f"Token count: {count:,} / {max_tokens:,} ({count/max_tokens*100:.1f}%)")
    return count <= max_tokens

# Usage
with open("contract.txt") as f:
    content = f.read()

# Reserve 8k for system prompt + response
if will_fit(content, max_tokens=120_000):
    print("Fits — safe to send")
else:
    print("Too large — need to chunk")

# Install: pip install tiktoken

For Claude specifically, use the Anthropic API's token counting endpoint: POST /v1/messages/count_tokens. It gives exact counts for Claude models without consuming tokens or triggering a generation.

python

import anthropic
from typing import List

client = anthropic.Anthropic()

def chunk_text(text: str, chunk_size: int = 3000) -> List[str]:
    """Split text into chunks at paragraph boundaries."""
    paragraphs = text.split('\n\n')
    chunks, current = [], []
    current_len = 0

    for para in paragraphs:
        para_len = len(para.split())
        if current_len + para_len > chunk_size and current:
            chunks.append('\n\n'.join(current))
            current, current_len = [], 0
        current.append(para)
        current_len += para_len

    if current:
        chunks.append('\n\n'.join(current))
    return chunks

def map_reduce_summarise(text: str) -> str:
    """Summarise a long document using map-reduce."""
    chunks = chunk_text(text)
    print(f"Processing {len(chunks)} chunks...")

    # MAP: summarise each chunk
    chunk_summaries = []
    for i, chunk in enumerate(chunks):
        response = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=300,
            messages=[{
                "role": "user",
                "content": f"Summarise the key points from this section in 3-5 bullet points:\n\n{chunk}"
            }]
        )
        chunk_summaries.append(response.content[0].text)
        print(f"  Chunk {i+1}/{len(chunks)} done")

    # REDUCE: combine summaries into final summary
    combined = "\n\n".join(chunk_summaries)
    final = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1000,
        messages=[{
            "role": "user",
            "content": f"Combine these section summaries into a coherent overall summary:\n\n{combined}"
        }]
    )
    return final.content[0].text

For documents you work with regularly, create a "compressed version" — a well-structured summary you maintain over time. A 300-page internal handbook can often be compressed to a 3,000-word structured reference that answers 90% of questions with far fewer tokens.

context windowtokenslimitsfundamentalsguides

🎓Interactive Courses

Ready to go further?

Take the interactive course — daily lessons, real exercises, XP and streaks. Turn reading into lasting skills.

Daily streaksXP & levels

Start a course