Deep Dives9 min read15 May 2026

Understanding Large Language Models

What actually happens inside a model like Claude or GPT? A plain-English explanation of how LLMs work — no maths required.

Large language models are the technology behind Claude, GPT-4, Gemini, and most of the AI tools you use today. Understanding how they actually work — even at a rough level — makes you a dramatically better AI user. It explains why they fail in certain ways, and how to work around those failures.

What an LLM is doing, at the core

At the most basic level, a language model predicts what text should come next. Given the words "The capital of France is", it predicts "Paris" comes next with very high probability. That sounds simple — and the mechanism is simple — but it produces remarkably sophisticated behaviour at scale.

Tokens, not words

Models don't see words. They see tokens — chunks of characters that might be a word, part of a word, or punctuation. "Unbelievable" might be split into "Un", "believ", "able". This matters because the model's context window (how much it can read at once) is measured in tokens, not words. Roughly 1 token ≈ 0.75 words in English.

Training: learning from the internet

Before you ever touch it, an LLM was trained on a vast corpus of text — books, websites, code, academic papers. During training, the model repeatedly tried to predict what came next, compared its prediction to the real text, and adjusted its internal parameters to be more accurate next time. Billions of adjustments over months of compute.

This is why LLMs know about history, can write code, and understand context — they've seen all of it in training data.

Fine-tuning and alignment

After initial training, models go through a second phase where human raters score responses for helpfulness, harmlessness, and honesty. The model learns to produce responses that score well. This is why Claude feels like it's trying to be genuinely helpful rather than just predicting likely text — the alignment training shapes its behaviour toward useful responses.

What LLMs genuinely can't do

  • They don't know what's happening right now (knowledge cutoff)
  • They can't access the internet unless explicitly given that tool
  • They can 'hallucinate' — confidently state false information — because they optimise for likely-sounding text, not verified truth
  • They don't have persistent memory between conversations (unless built in separately)
  • They can't count or do arithmetic reliably — they pattern-match numbers rather than compute

Context windows

A context window is how much text the model can hold in its 'working memory' at once. Claude's context window is very large — over 200,000 tokens. This means you can paste in an entire book and ask questions about it. But everything outside the context window is invisible to the model — it doesn't have long-term memory unless you build it in.

Key insight: LLMs are pattern engines, not reasoning engines. They approximate reasoning very well, but understanding this distinction helps you know when to trust them and when to verify.

Why this matters for how you use AI

Knowing that models predict likely text explains why they hallucinate — the most likely-sounding answer isn't always the true one. It explains why giving more context improves output — you're giving the model better signal about what 'likely' should mean in your situation. And it explains why they're so good at creative and linguistic tasks but less reliable for pure fact retrieval.

Test your knowledge

· 4 questions

Sign in to take the quiz

Create an account to test your knowledge, track scores, and mark lessons complete.

🎓Interactive Courses

Ready to go further?

Take the interactive course — daily lessons, real exercises, XP and streaks. Turn reading into lasting skills.

Daily streaksXP & levels
Start a course