Agents need time

Thomas Guthrie · 23 February 2026

Every interaction with a language model is a burst. A question arrives, tokens stream back, and then nothing. The model does not experience the gap between your questions, because for the model there is no gap. Each exchange is the first and last moment of its existence.

Agentic systems stretch the burst, but they are still bursts: a task starts, the agent runs for minutes or hours, the task ends. The agent is a flare. It lights, burns through its work, and goes dark.

Real work is not shaped like this. Real work is long. A sales pipeline takes months. A codebase is maintained for years. A relationship with a customer is, if things go well, never finished at all. The defining property of real work is not difficulty. It is duration. The work continues, and whatever holds it has to continue too.

This sounds like a small observation. It changes everything about the problem.

Over minutes, memory is a convenience. Over months, memory is the work. An agent handling a support ticket needs the ticket. An agent that owns support needs to remember that this customer was angry in March, that the workaround has broken twice before, that engineering promised a fix in the next release and did not ship it. Strip the memory away and you do not get a worse version of the same agent. You get a stranger, permanently, every morning.

Over minutes, errors are events. Over months, errors are compound interest. A system that is right 99 percent of the time looks superb in a demo and is a catastrophe across ten thousand unsupervised decisions, unless something exists to catch the drift. This is why long-horizon autonomy is mostly an error-correction problem rather than a performance problem. The question is never whether the agent will go wrong. It is whether the structure around the agent makes errors converge or lets them compound.

Over minutes, the world holds still. Over months, it does not. Prices change, people quit, policies are rewritten, and the thing that was true in January is false in June. A long-running agent is not executing a plan. It is maintaining a model of a moving world and noticing when that model breaks. That is a different activity from completing tasks, and a system has to be built for it on purpose.

I think the field underestimates how much of intelligence exists only across time. Judgment is pattern recognition over consequences you have personally absorbed. Reliability is an error rate integrated over a long window. Even identity, the plain fact of being the same agent tomorrow that you were today, turns out to be an architectural achievement rather than something you get for free.

This is why every research direction at Artemis Labs has time inside it. Long-horizon planning, persistent memory, self-correction: these are not separate problems on a list. They are what intelligence looks like once you stop measuring it in seconds.

The systems we want to exist are not flares. They are fires that stay lit.