The Agent Loop: From Answering to Acting
After this you can tell when a task actually needs an agent, build the loop around the model rather than expecting the model to be the agent, and stop the loop before it runs away.
A chatbot answers. You ask, it produces one output, and the exchange is over. An agent does something different in kind, not degree. You give it a goal and a set of tools, it picks an action and runs it against the world, it reads what came back, and it decides what to do next. Then it does that again. The whole thing is a loop that keeps going until the model judges the goal met or some condition outside the model forces it to stop. The single output becomes a chain of model calls, each one fed the accumulated results of the calls before it.
The mistake almost everyone makes early is to point at the model and call it the agent. The model is not the agent. The model is one component the agent calls, repeatedly, the way a function gets called inside a program. What turns a model into an agent is the harness, everything wrapped around it. There is the loop itself, which decides whether to call the model again. There is the tool registry, the set of functions the model is allowed to invoke and the descriptions that tell it when each one applies. There are the stop conditions that end the loop. And there is the context policy that decides what the model sees on each turn, because the window does not hold the whole history for free. The agent is the harness, and the model is the part of the harness that makes the decisions.
Once you see the harness as the agent, the real decision moves earlier. The question is not how to build a good agent. It is whether this task wants a loop at all. Most do not. A loop earns its cost only when the path cannot be known in advance, when the number of steps depends on what the input turns out to be and the model has to react to each intermediate result to decide what comes next. A coding task fits, because you cannot script which file to open next without knowing what the last file contained. A great deal of what gets called agentic does not fit. If you can draw the flowchart, build the flowchart. A fixed sequence of steps is a workflow, it is cheaper, and it fails in places you can point to. The loop is for when the flowchart has branches you cannot enumerate.
Where it breaks
Two failures dominate, and they sit on opposite sides of the same decision.
The first is the runaway loop. The agent never decides it is done. It re-reads the same file, re-runs the same search, oscillates between two actions, or keeps almost-finishing without finishing. The usual cause is a goal the model cannot tell it has satisfied, or tool results that do not actually change the state it is reasoning over, so each turn looks like the last and the model has no signal to stop. This is also where the loop's economics turn against you, because reliability compounds the wrong way. If each step in a chain succeeds ninety-five percent of the time, a twenty-step task succeeds at 0.95 to the twentieth, about thirty-six percent, and the failure grows with every step you add rather than summing. The defense is a hard cap on iterations, the one guardrail every production loop has and the one beginners leave out. Pair it with a token or cost budget, loop detection that breaks when recent actions start repeating, and a periodic are-you-done check. The cap is not a nicety. It is the difference between an agent that stops and one you find still running an hour later having spent real money going in a circle.
The second failure is quieter and more common. You reach for an agent when the task never needed a loop. Sean Goedecke describes someone who built an agent to summarize a document and shipped what was really a single API call wearing a costume, ten times slower and occasionally strange because the loop gave the model room to misbehave on a job that had no second step. This is the usual error, and it usually overshoots by about two rungs. Summarizing one document is a single call. Most teams climb straight to an autonomous agent for it because agents are the exciting thing. The honest default runs the other way. Start at the lowest rung that could work, a plain prompt, then a prompt with one tool, then a fixed workflow, and only move up when the rung below it demonstrably fails on a real input. Every rung you climb buys flexibility with reliability and cost, so you spend it only when the task forces you to.
Before you build a loop, run the task through this. Paste it into a fresh chat with your own task filled in, and let the answer decide the rung for you.
My task: <describe the whole job in one or two sentences>
Answer these, then recommend the lowest rung that works:
1. Can I draw the full flowchart of steps in advance? (yes / no)
2. Does the number of steps depend on what intermediate results come back? (yes / no)
3. Does the model need to react to one step's output to decide the next? (yes / no)
4. If a step is wrong, is the damage cheap to catch and undo? (yes / no)
Rule: if Q1 is yes, build the flowchart (a workflow or a single call), not an agent.
Only recommend an agent loop when Q2 and Q3 are both yes — the path genuinely
cannot be known ahead of time. If you recommend a loop, state the single stop
condition that ends it and a hard max-iteration cap. Do not pad a one-shot task
into a loop because a loop sounds more capable.The block forces the decision that the excitement of agents tends to skip. The first question is the filter that catches most over-escalations on its own. The last line is the one that keeps a loop honest, because an agent you build without naming its stop condition and its cap is the runaway failure waiting to happen.
Worked example
IllustrativeIllustrative. A constructed exchange to show the pattern, not a real session.
Take a partnerships task that genuinely needs a loop. You want a shortlist of partner candidates that fit a profile, with a personalized first-draft email for each, and you cannot script it ahead of time because which candidates clear the bar, and which contact at each one is right, depends on what the research turns up. So the path is unknown, and a loop earns its place.
You: Find three partner candidates matching this profile, score them on real evidence, pick the right contact at each, and draft an opener. Goal: three evidence-backed drafts. Stop when you have three, or after twelve actions.
Agent: Searching for candidates against the profile... found six. Scoring each on its public footprint... two score low on fit, dropping them. Four remain. For candidate 1, the right contact for this ask is the head of partnerships, not the founder. Finding the address... drafting an opener grounded in their recent ecosystem move... saving state to file. Candidate 2 next...
The loop is visible in the trace. Search is the act, the six results are the observation, dropping the two weak fits is the decision, and the whole thing repeats per candidate.
The state-to-file step matters more than it looks. By writing each finished draft out as it goes rather than holding everything in the window, the agent can resume from where it stopped instead of losing the run if it hits a cap or a crash mid-way. That is the same instinct as the iteration cap, applied to memory instead of time.
Notice what would have happened if this were summarizing one document instead. There is no second step that depends on the first, nothing the model has to react to, so the loop would only add latency and a chance to wander. What earns the loop here has nothing to do with the partnerships subject. Questions two and three both came back yes, and the stop condition was nameable before the first action ran. That structure is what justifies a loop, and it is exactly what a one-document summary lacks.