The librarian metaphor

“A very well-read librarian who can only quote what they've read.”

The word doesn't mean what you think

"Artificial intelligence" sounds like a thing. It isn't. It's a moving label we slap on whatever computers couldn't do yesterday. Chess used to be AI. Spell check used to be AI. The moment a trick becomes routine, we stop calling it AI and start calling it software. The historian Pamela McCorduck named this the , and it's the single most useful idea for keeping a clear head in this field.

What people mean *today* when they say AI is almost always one specific family: large statistical models trained on enormous piles of text, images, or code, that produce plausible continuations of an input. Everything else in this course is the unpacking of that one sentence.

The label “AI” keeps shrinking around whatever is currently hard.

Two very different families of AI

Historically there have been two competing visions of how to build intelligent machines. The first, symbolic AI, tried to encode the world as rules and logic, the way a lawyer reasons. The second, machine learning, gave up on rules and instead let computers extract patterns from data, the way a child learns what a dog looks like by seeing many dogs.

Symbolic AI dominated until the late 1990s. It is what beat Kasparov at chess. It is also what spent forty years trying, and failing, to write down what "a chair" is. Machine learning won the modern era because the world turned out to be too messy for rules. The systems you use today (ChatGPT, Claude, Midjourney, Whisper) are pure machine learning.

Where the "intelligence" actually lives

When you ask a model a question, no reasoning step happens in any human sense. The model is a giant function. You feed it a sequence of numbers (your prompt, encoded) and it returns another sequence of numbers (its answer). The numbers in the middle, billions of them, were set during training so that the function tends to output plausible continuations. That's the whole machine.

Calling this "intelligence" is fair only in a narrow sense. The model has compressed an enormous amount of human writing into a shape that lets it produce more of it. It has no goals, no memory of yesterday, no internal monologue between your messages. Each call starts from a blank slate plus your prompt. The illusion of a persistent mind comes from us, not from the model.

Why now?

The mathematics behind today's models is mostly from the 1980s. What changed is the scale: enough text on the internet to train on, enough GPUs to run the training, and one architecture, the transformer, that scales gracefully when you throw more of both at it. The 2017 "Attention is all you need" paper didn't invent ideas so much as combine them in a way that finally rewarded scale. We'll meet attention in chapter 5.

Three numbers help calibrate. GPT-2 (2019) had 1.5 billion parameters and felt like a toy. GPT-3 (2020) had 175 billion and felt eerie. Frontier models in 2025 are estimated in the trillions, trained on something close to the meaningful portion of all public human writing. The qualitative leap between those steps was not the algorithm. It was the scale.

What this course is for

By the end of chapter 7 you should be able to read an AI announcement, a job posting, or a research paper headline and form your own opinion about what is actually going on under the hood. You won't have built a model, but you will know what a parameter, a token, an embedding, and an attention head are, why they matter, and where the load-bearing claims of the field are weak.

If you've never written code: you'll be fine. The dive-deepers are optional.
If you're a student: the dives have the math and the original papers.
If you're a working professional: chapter 7 is for you, but earn it by reading 1 to 6 first.

In one line each

"AI" is a moving label. What you use today is one specific family: large statistical models that produce plausible continuations.
There are no rules inside. There are billions of numbers that were tuned so the output is plausible.
Modern AI works because of scale (data + compute) plus one architecture (the transformer), not a new theory of mind.
Capabilities are real but narrow. "AI can do X" almost always means "a specific model on a specific setup did X."

Where to go next

Chapter 2: How models learn