Guides7 min read25 May 2026

Temperature, Top-P, and Max Tokens Explained

Three settings that change everything about how an AI responds — yet most people leave them at defaults. Here's what they actually do.

For extraction tasks (pull JSON from text, classify sentiment, extract dates), always set temperature to 0. This makes your pipeline deterministic and testable — the same input will always produce the same output.
Most major AI providers (OpenAI, Anthropic) recommend setting either temperature OR top-P, not both simultaneously. When both are non-default, they interact in ways that are hard to predict. Pick one mechanism and leave the other at its default.
In practice, most developers only need temperature. Top-P gives finer-grained control but the difference is subtle for most tasks. Learn temperature first — add top-P only when you have a specific reason.
Check the stop_reason field in API responses. If you see "max_tokens" frequently, your responses are being cut off before completion. Either increase max_tokens or instruct the model to be more concise.
python
import anthropic

client = anthropic.Anthropic()

# ── Example 1: JSON extraction — deterministic ────────────────────────────────
# Use temperature 0 for consistent, testable extraction
extraction_response = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=500,       # extraction outputs are short
    temperature=0,        # deterministic — same input = same output
    system="Extract data as JSON. Output only the JSON object, no explanation.",
    messages=[{
        "role": "user",
        "content": 'Extract: name, email, company from: "Hi, I'm Alex Chen from Stripe, alex@stripe.com"'
    }]
)
# Always returns: {"name": "Alex Chen", "email": "alex@stripe.com", "company": "Stripe"}

# ── Example 2: Creative writing — varied ──────────────────────────────────────
# Use higher temperature for diversity
creative_response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=800,       # enough for a few paragraphs
    temperature=0.9,      # creative, varied
    # Do not set top_p when temperature is non-default
    messages=[{
        "role": "user",
        "content": "Write three different opening lines for a startup's about page. Each should feel distinct."
    }]
)

# ── Example 3: Structured analysis — balanced ─────────────────────────────────
analysis_response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=2000,      # room for detailed analysis
    temperature=0.3,      # mostly consistent but slight variation in phrasing
    system="You are an analyst. Always structure responses with headers.",
    messages=[{
        "role": "user",
        "content": "Analyse the pros and cons of microservices vs monolith for a 5-person startup."
    }]
)
A production classification pipeline had sporadic wrong answers that were hard to reproduce. Root cause: temperature was left at default (1.0). When the model was 60% confident in label A and 40% in label B, sometimes it picked B. Setting temperature = 0 eliminated all ambiguous cases — it always picks the highest-probability label, making the pipeline deterministic and debuggable.
temperatureparametersAPIconfigurationguides
🎓Interactive Courses

Ready to go further?

Take the interactive course — daily lessons, real exercises, XP and streaks. Turn reading into lasting skills.

Daily streaksXP & levels
Start a course