How AI Models Actually Work

Beginner

After this you can predict when a model will be confidently wrong, because you understand what it is actually doing when it answers, which is guessing the next word rather than looking anything up.

Understand

Underneath the chat interface there is one operation, repeated. The model reads everything in front of it and predicts a likely next chunk of text, leaning toward the more probable continuations. It appends that chunk, then reads the whole thing again and predicts the next one. Word by word, that loop is the entire act of generation. There is no separate step where it decides what is true, checks a source, or runs a calculation. It is a continuation machine, and the answer you read is whatever continuation looked most probable given the words already on the page.

The prediction loopgeneration is one step run over and over — read everything, predict the next token, append it, read again — not a lookup followed by a write.

"Most likely" is the part that catches people out, because likely is not the same as correct. The model was trained on an enormous amount of human writing, and it learned the statistical center of all of it. Ask a vague question and you get the average answer, the one that sits closest to everything ever written on the topic. This is why ungrounded AI output reads so generic. It is not the model being lazy. Leaning toward the most probable next token pulls the output toward the most average one, and average is the default it falls back to whenever you have not given it something sharper to predict from.

The second surprise is that the model does not remember you. Each request starts from nothing. Everything it appears to "know" about your conversation is text that gets resent, in full, on every single turn. The chat interface hides this by quietly re-attaching the earlier messages each time you hit enter. A person you talk to carries the conversation forward in their own head. The model does not. It is handed a transcript, reads it cold, predicts a reply, and forgets all of it the instant the reply is done.

It starts fresh every timea human carries memory between turns; the model carries nothing — it only sees the transcript you resend on this exact call.

Put those two facts together and hallucination stops looking like a glitch. When you ask for a fact the model was never reliably trained on, it does not stop and say it does not know. It does what it always does. It predicts the most plausible-looking continuation. A made-up citation, a confident wrong date, an invented function name. These come out fluent and self-assured for the same reason a correct answer does, because both are just probable strings of text. The model has no internal signal that separates "I am recalling something solid" from "I am filling a gap with something that fits the pattern." Confidence in the wording tells you nothing about whether the content is true. Fluent and wrong is not a contradiction. It is the normal output of a system optimized to sound right.

The place this bites hardest is numbers, because there a wrong answer hides perfectly. The model does not run arithmetic. It predicts a number string that looks like the kind of number that belongs there. Part of why is mechanical. The model reads text in chunks called tokens, and a figure like 1,520 often gets split into pieces such as 1, 52, 0, so it is predicting plausible next pieces rather than carrying a one across a column. On a benchmark of four-digit multiplication this kind of model scored around 2% accuracy, and the failures were the dangerous kind, off by a digit rather than absurd. A result that is wrong by an order of magnitude you would catch. A result that is wrong by a little looks exactly like a real answer.

Where it breaks

The model is at its most dangerous precisely when it sounds most competent, so the failure conditions are worth naming directly.

Computation is the first place it fails. Sums, averages, multi-step math, counting items in a list, comparing figures. The output will be formatted like a calculation and arrived at like a guess. Treat any unverified number from a raw model as a draft, never a result.

Rare or recent facts are the next weak spot, and obscure-but-plausible ones are the worst of all. A real but little-known statute, a niche API, a person's exact title. The model will not refuse. It will generate something that fits the shape of the answer, and you will not be able to tell from the text that it was invented.

The third failure runs across turns, disguised by the interface. Because the model only knows what is in the transcript it was sent, anything dropped from that transcript is simply gone. If a long conversation gets trimmed to fit, an instruction you gave forty messages ago may no longer be in front of the model at all. It will not tell you it lost the thread. It will answer as if the instruction never existed.

One honest limit on all of this. None of it means the model is usually wrong. On common, well-represented material it is often right, and right cheaply. The point is narrower and more useful. The model gives you no reliable internal signal for which answers are the solid ones, so that judgment stays with you.

Do it now

Before you trust any factual or numeric answer from a raw model, run it through this check. Paste it as a follow-up to an answer you are about to rely on, or keep it as a habit you apply by hand.

Paste this

You just gave me an answer I'm about to use. Before I trust it:

1. Which parts are recalled facts vs. things you generated to fit the pattern?
   Flag anything you are not confident is true.
2. Are there any numbers here you computed yourself rather than just
   producing a plausible figure? Mark each number: COMPUTED or ESTIMATED.
3. If this needs real arithmetic or a real lookup, say so and tell me
   what tool or source would actually settle it — don't guess.

Do not defend the original answer. If part of it was a guess, say it was a guess.

The check works because it forces the model to sort its own output into recalled-versus-generated and computed-versus-estimated, the exact distinction it papers over by default. It will not make a wrong answer right. What it does is surface the soft spots so you know which lines to verify yourself. For the numbers it flags as ESTIMATED, the real fix is to make the model compute instead of predict. Ask it to write the calculation as code you can run, or pull the figure from a spreadsheet or database. A model that runs a calculation is a different thing from a model that predicts what the calculation probably says.

Worked example

Illustrative

Illustrative. A constructed exchange showing the failure mode and the check, not a transcript of a real session.

An analyst needs average customer lifetime value for a slide. They paste a table of customer records into a chat model and ask for the average.

You: [pastes ~400 rows of customer revenue and tenure] What's the average customer lifetime value across these customers?

Assistant: The average customer lifetime value across these customers is $12,847.

It is instant, formatted, and confident. It looks computed. It goes in the exec deck. The next day someone works the figure out by hand from the same table and gets $4,291. The model never divided anything. It read a table of numbers and predicted a number that looked like a believable average for data of that shape. The output carried no hint that it was a guess, because to the model it was the same operation as every other answer it gives.

Run the same request through the check and the failure surfaces before the slide does:

You: [the check block above, pasted under the $12,847 answer]

Assistant: Honest answer: $12,847 was ESTIMATED, not COMPUTED. I did not sum the revenue column and divide by the customer count. I produced a figure that looked plausible for this data. To settle it, run this on the table: df['lifetime_value'].mean() or compute it in your spreadsheet. Do not use the $12,847 figure.

Why the number was wrongthe same question takes two different paths — prediction produces a plausible-looking figure, computation produces the real one. A raw model is on the top path by default.

Both answers came from the same model reading the same table. The difference was the path. One predicted a number that looked right, the other ran the arithmetic and got the number that was right. The model gives no sign of which path it took, so that check is yours to run.