Shape the Output, Not Just the Content

Intermediate

After this you can specify the *shape* of a model's output so its results drop straight into the next step of your workflow, without forcing the model into a structure that quietly wrecks its reasoning.

Understand

You already know how to ask for the right content. The next thing that decides whether a model is useful inside a pipeline is the shape of what comes back. A wall of prose is fine for a human reader and useless to the script that has to read field three of every response. So you start specifying shape: a JSON object, a table with these columns, a ranked list, one verdict per row. The moment your output feeds a second step instead of your own eyes, shape stops being cosmetic and becomes the interface.

The trap is that the obvious way to get shape is the worst way to get a good answer. Tell a reasoning model "return only valid JSON, no other text" and you have just asked it to do two jobs in the same breath, think the problem through and emit a grammar-clean object, with no room to do the first before the second. Forcing strict JSON-mode or constrained decoding measurably degrades reasoning on tasks that need the model to work through steps, because the decoder gets cut off from the very token paths it would use to reason out loud (Tam et al. 2024, "Let Me Speak Freely?"). The braces and keys interrupt the thinking. You wanted structure and you paid for it in correctness, and the output still looks clean, so you do not notice the tax until the values start being subtly wrong.

The fix is to stop fusing the two jobs. Let the model reason in free text first, then extract the structure in a second pass, a second call or a clearly delimited later section. Reasoning happens where reasoning is cheap, in open prose. Shaping happens after the answer exists, where there is nothing left to degrade. You pay one extra step and you keep both properties instead of trading one for the other.

Single-pass vs reason-then-extractwhy folding reasoning and formatting into one call degrades the answer, and how splitting them keeps both.

Underneath this sits a distinction worth holding separately in your head, because conflating the three is how confident-wrong data ships at machine speed. Format compliance, reasoning quality, and semantic correctness are three different properties and none of them implies the next. Format compliance asks only whether the bytes are valid JSON matching the schema, and constrained decoding can guarantee it. Reasoning quality asks whether the model thought well to get there. Semantic correctness asks whether the values are actually true. A constrained decoder will happily emit a perfectly valid object full of hallucinated numbers and valid-but-wrong enum choices ("Schema validity is not semantic correctness," dottxt). The schema locks the shape and says nothing about the truth. So a passing validator buys you exactly one of the three properties, and teams that ship JSON mode and call it done are shipping fiction in well-formed wrappers.

Three properties, three guaranteeswhat each property promises and the gap each one leaves open, so a green validator never gets mistaken for a correct answer.

Where it breaks

The two-pass reason-then-extract move is overhead, and overhead you do not need is just slower. The reasoning-degradation trap only bites when the task requires genuine reasoning and structure at the same time. For a flat extraction, "pull these three fields from this text," there is no chain of thought to cut off, so a single structured-output call is correct and splitting it into two is wasted latency and tokens. Match the pattern to the task. Reason-then-extract earns its second call on judgment-heavy work and loses on lookups.

The other failure is treating shape as a correctness guarantee, which it never is. A schema that validates tells you the consumer will not choke on malformed input. It tells you nothing about whether the values are real. Skip the content validation that checks ranges, enum membership against ground truth, and cross-field consistency, and you have built a pipeline that moves hallucinations downstream faster and more reliably than prose ever could, because now they parse. The shape that makes data easy to consume also makes wrong data easy to consume.

Do it now

When a task needs the model to actually think and you also need machine-readable output, stop asking for both at once. Paste this as a two-pass instruction. The model reasons in the open, then structures what it already worked out.

Paste this

TASK: <the judgment-heavy thing you need decided or analyzed>

Work in two passes. Do not skip to the structured output.

PASS 1 — Reason in plain prose. Think the problem through out loud:
the relevant factors, the tradeoffs, the edge cases, your conclusion
and why. Do not produce any JSON yet.

PASS 2 — Below a line that says "=== STRUCTURED ===", and ONLY there,
emit a single JSON object matching this schema. Copy your conclusion
from Pass 1; do not re-derive it:
{
  "verdict": "<one of: approve | reject | needs_review>",
  "confidence": "<low | medium | high>",
  "key_reasons": ["<short>", "<short>"],
  "flags": ["<anything a human should double-check>"]
}

After the JSON, stop.

Then validate the values, not just the shape. A schema check passing means the JSON parsed, nothing more. Before you trust a field downstream, paste this guard-line into the step that consumes it.

Paste this

Reject any value that parses but cannot be true: enum fields must
match the allowed set exactly, numbers must fall in their real range,
and cross-field claims must agree (e.g. verdict=approve must not carry
a high-severity flag). A valid shape is not a correct answer.

Worked example

Illustrative

Illustrative. A constructed run to show the pattern, not a real deployment.

A support pipeline routes incoming tickets. Each ticket needs a judgment call (is this billing, a bug, or churn risk?) and the router downstream needs a clean enum to dispatch on. The tempting prompt fuses both jobs:

Prompt: Classify this ticket. Return only {"category": "...", "priority": "..."} and nothing else.

Output: {"category": "billing", "priority": "high"}

It parses. The router is happy. But the ticket actually read "charged twice, considering cancelling, third time this month". It was a churn-risk escalation the model never got room to reason about, because the instant it saw "return only JSON" it jumped to the shape and pattern-matched on the word "charged." Valid object, wrong call, and nothing in the output reveals the miss.

Run it two-pass and the reasoning surfaces before the shape locks:

Prompt: [the two-pass block above, schema set to category + priority + churn_risk]

Pass 1: Customer was double-charged, which reads billing on the surface. But "considering cancelling" plus "third time this month" is a retention signal, not a routine billing fix. The repeat pattern raises priority. Primary category is churn risk with a billing root cause.

=== STRUCTURED === {"category": "churn_risk", "priority": "high", "key_reasons": ["repeat double-charge", "explicit cancel intent"], "flags": ["route to retention, not billing queue"]}

The free-text pass let the judgment happen, and the structured pass then captured it cleanly for the router. The consuming step runs the value guard, sees category is a legal enum and the flags array is non-empty, and routes to retention instead of dispatching a churn case into the billing queue. Had the shape been forced first, that reasoning would never have surfaced to be captured at all.