Analytics Interpretation Into Narrative

Intermediate

After this you can turn a metric into a plain-language read a stakeholder will act on, while keeping the model off the numbers — it writes the story, deterministic tools compute the figures, and an assumptions table gates the read before any conclusion leaves your hands.

Understand

The appeal is obvious. You hand a model a CSV and a question, and seconds later you have a fluent paragraph explaining what happened to conversion last month, why retention dipped, which cohort is carrying revenue. The paragraph reads like something a competent analyst would write. That is exactly the problem, because reading like a competent analyst and being right are two different properties, and the model only optimizes the first one.

Start from what the model is actually doing when you ask it to "analyze this data." It is not running a calculator. It is predicting the most probable next tokens given your numbers and your question, which means it generates text that looks like analysis rather than computing the analysis. Most of the time the gap is invisible, because a plausible story about a real trend is often roughly correct. Then one day it is not, and nothing in the output tells you which day that is. An analyst asked a model for average customer lifetime value, got "$12,847," and put it in an exec deck. Recomputed by hand the next morning, the real number was $4,291. The model did not flag uncertainty. It delivered the wrong figure with the same composure it would have delivered the right one.

Who owns whatthe division of labor — the model owns the narrative, deterministic tools own every number, and an assumptions table sits as a gate between the computed read and any conclusion that ships.

This division of labor is the whole technique. The model is genuinely good at one half of the job — taking a set of verified figures and writing the clear, audience-shaped explanation of what they mean. It is unreliable at the other half — producing the figures themselves. So you split the work along that seam. Computation goes to something deterministic: a SQL query you can read, Python you execute in a sandbox, a spreadsheet formula. Narrative goes to the model, fed only the numbers the tools already produced, with an instruction to invent nothing. The reframe that makes this click is from the marketing-analytics post-mortems: when AI replaced a manual stack, the result was "garbage in, compelling-looking garbage out — now with AI confidence." The confidence is the new failure surface. A junior analyst's wrong number looks tentative; the model's wrong number looks finished.

What actually changed with AI is subtler than "now analysis is easy." Before models, producing an analysis was the bottleneck. You needed SQL, a cohort table, the experiment set up correctly — and that friction was a filter. It was slow, but the slowness meant a human who understood the data was forced to touch every number on the way to a conclusion. The model removes that friction, and people read the removal as pure upside. It is not. The friction was doing protective work. With it gone, the bottleneck moves downstream to vetting, and the operator's value moves with it. Your job is no longer to produce the read. It is to be the one person in the loop who can look at a confident paragraph and ask whether the number underneath it is computed on the right base.

The bottleneck movedthe inversion — producing the analysis used to be the slow, expensive step (and that friction filtered out bad reads), so removing it relocates both the bottleneck and the operator's value to vetting.

Where it breaks

The signature failure is not a wild hallucination you would catch on sight. It is the wrong denominator, and it is dangerous precisely because the resulting narrative is internally consistent and confidently wrong. Hand a model a CSV of users who saw the new checkout flow and ask for the conversion rate, and it reports "45% conversion" — computed over the rows it can see, which silently exclude everyone who churned before reaching the flow, everyone who saw the old flow, and everyone lost to a routing error. The model does not know what it cannot see, and it does not ask. The number is arithmetically fine for the data in front of it and wrong for the question you meant. The same trap shows up in a GROUP BY that divides by total user count instead of the cohort-specific count: the query runs, returns rows, and nothing signals that the logic is off. There is no error message for asking a precise question of the wrong base.

This break has a boundary worth naming, because the technique is not universal. The split helps when the risk is a number — a rate, a total, an average, anything with a denominator or a unit that can be silently wrong. It does nothing for the harder failure underneath: a wrong assumption the tools will execute faithfully. If you tell the system a monthly rate is annual, or point it at the wrong attribution model, the SQL computes the wrong thing precisely, and the precision makes the error feel more authoritative, not less. Tools fix arithmetic. They do not fix a flawed model of the business. That is why the assumptions table is the load-bearing part and the code is merely the easy part — the table is where a human states the denominator, the units, the rate type, the cohort frame, and the timing, in writing, before reading the conclusion as true.

Do it now

Paste this before any analytics-to-narrative task. It forces the two rules that keep the read honest: numbers come from executed tools, never from prose, and no conclusion ships without an assumptions table you have read.

Paste this

Role: you are interpreting analytics into a stakeholder-ready narrative.

Hard rules:
1. Do NOT compute any number in prose. For every figure, write or call code
   (SQL / Python) that produces it, and show that code. If a number cannot be
   computed from the data I gave you, write UNKNOWN — never estimate it.
2. Before any conclusion, output an ASSUMPTIONS TABLE with one row per metric:
   | metric | exact denominator (population it's over) | units | rate type (monthly/annual/etc) | cohort frame | timing |
3. State what this dataset can NOT see (who is filtered out, who never reached
   this step). If the denominator might exclude relevant users, say so and stop.

Then, and only then, write the narrative — grounded only in the computed
figures, in plain language, for: <the stakeholder + the decision they face>.

Data: <paste the data, or the schema + a tool the model can call>
Question: <the one read you want>

The table is doing the real work. Most wrong reads do not survive being forced to name the denominator out loud, because the act of writing "population this is over: users who reached checkout" is the moment you notice it should have been "all sessions that started checkout." Rule 1 keeps the model from quietly carrying the one and getting it wrong; rule 3 surfaces the filtered-population trap before it becomes a confident paragraph. If the model cannot fill a row, that is the signal to stop, not to let it guess a plausible value to keep the narrative flowing.

Worked example

Illustrative

Illustrative. A constructed read to show the gate catching a wrong denominator, not a real analysis.

A growth lead exports a CSV of users who reached the redesigned checkout and asks for the headline. The ungated move is to hand over the file and take the paragraph:

You: [pastes checkout_redesign.csv] What's our conversion story for the new flow?

Assistant: Strong result. The redesigned checkout is converting at 45%, well above the ~30% baseline — a clear win worth rolling out to all traffic.

It reads like a finished analysis, so it is tempting to paste it into the rollout deck. Run the same request through the gate and the table fills before the narrative:

You: [the prompt block above, same CSV, stakeholder = VP Growth deciding on full rollout]

Assistant: Assumptions table:

metric denominator units rate type cohort frame timing
conversion 1,840 users in this file % per-session only users who reached checkout last 30d

What this data cannot see: users who abandoned before reaching checkout, users routed to the old flow, and sessions lost to errors. The 45% is computed over arrivals at checkout, not over everyone who started a purchase — so it is not comparable to the ~30% baseline, which is measured over all purchase-starts.

Conversion among users who reached the new checkout: 45.2% (832 / 1,840) — but I cannot compute true flow conversion from this file alone. I'd need the count of everyone routed into the flow, including those who never arrived. UNKNOWN until that denominator is supplied. I'd hold the rollout claim until then.

metric	denominator	units	rate type	cohort frame	timing
conversion	1,840 users in this file	%	per-session	only users who reached checkout	last 30d

Same model, same file, opposite outcome. The first answer compared a checkout-arrivals rate against an all-starts baseline and called it a win — the exact denominator mismatch that turns "45%" into a confident wrong story. The gate did not need a better model; it needed the model forced to name its denominator and admit what the file excludes. This is the same shape as the $12,847 lifetime-value figure that went into an exec deck before anyone recomputed it to $4,291: the number looked authoritative, the narrative was fluent, and only a vetting step stood between the read and a decision made on it.

What the model sees vs the true basethe wrong-denominator failure — the model computes over the filtered CSV it was handed, blind to the larger population, so "45%" is right for its rows and wrong for the question.

The forecasting version of this is worth seeing once, because it shows the fix scaling past a single read. When teams point AI at revenue or pipeline numbers, the failures cluster into three shapes — fabricated figures, text-to-SQL translation errors, and data drift — and all three trace to the same root: data definitions that are not consistent across the system, so "active user" or "net revenue" means something different depending on which table the query lands in. The durable fix is upstream of any prompt. A governed semantic layer defines each metric once, authoritatively, so the model's query resolves against a fixed meaning instead of guessing from a raw schema. The per-read assumptions table and the org-level semantic layer are the same discipline at two scales: pin the denominator and the definition before you trust the narrative built on them.

Three failure shapes, one rootAI forecasting failures — fabricated numbers, text-to-SQL errors, and data drift — all trace to inconsistent data definitions, which a governed semantic layer fixes by defining each metric once.