Why Measurement Needs to Be Agentic

Chrysippus of Soli posed a question that kept Greek philosophers arguing for centuries: when does a heap of sand stop being a heap?

Remove one grain. Still a heap. Remove another. Still a heap. Keep removing, grain by grain, and at some point the heap is gone. But no single grain made the difference. The transition happened somewhere in the middle, indeterminately, without announcement.

Chrysippus did not propose a resolution. He was pointing at a structure in reality: some things do not have sharp edges. Some categories are constitutively vague. And the interesting question is not where the line is, but what it means that there is no line.

Marketing measurement has a Sorites problem. But we have spent twenty years pretending otherwise.

The lie we tell ourselves

The standard story goes like this: gather your data, clean it up, run the model, read the coefficients, allocate your budget. Five steps. Clean. Repeatable. Auditable.

Except the data was never clean. The coefficients were always uncertain. The allocation was always a guess dressed in confidence intervals that nobody interrogated. And the model you ran last quarter is now operating on a media landscape that shifted while you were building the dashboard.

The measurement bar has not stood still. It has been rising for five years and it is rising faster now. What counted as a serious MMM in 2020 would not survive a 2026 peer review. Level 2 is the floor. Level 3 is average. Level 4, which most firms still describe as aspirational, is table stakes for anyone competing on media efficiency at scale.

And Level 5 does not yet have a widely accepted definition. Which is itself a signal.

Where the grains go missing

There are three places the quality erodes, each quiet enough that no single failure triggers an audit.

The first is data preparation. Every channel carries its own physics. Sponsorship spend is invoiced at signing but earns across the contract period, so attributing it to the payment month overstates spend concentration and distorts your decay curves. Influencer spend moves differently again: some of it is in-market for seventy-two hours, some of it compounds over three months depending on deal structure. A rulebook that treats all spend as event-date spend is wrong for every channel that is not paid search.

Most models run on data that has not been cleaned to channel-specific best practice. They are accurate given their inputs. The inputs are wrong.

The second is priors. Bayesian MMM requires priors over parameters: saturation curves, carryover decay, contribution ranges. If you do not specify meaningful priors, the model learns from your data alone. If your data covers six months, the model has six months of signal to work with. If your prior says "TV carryover decay is somewhere between zero and one," that is not a prior, that is an admission of ignorance wearing a mathematical hat.

Informed priors make models faster to converge, more robust to sparse data, and more defensible in front of a CFO who asks why the recommendation changed when you added one quarter of data. But informed priors require a knowledge base: what have other models learned, in comparable categories, over comparable periods? That knowledge base does not exist for most practitioners. So each engagement starts from scratch, diffuse, and earns its conclusions slowly.

The third is contextual control. Economic headwinds, category demand shifts, competitor activity, seasonal anomalies: a model that does not control for these variables attributes their variance to your media. If your category surged because a competitor pulled out of the market and your model did not know that, your channels look better than they are. When the competitor returns, your channels will look worse, and you will cut spend into the wrong period.

Pulling economic indicators, category proxies, and macro controls is possible. It is also time-consuming, requires domain judgment about which controls belong, and needs to be repeated every time the model is updated. Most teams skip it. The model absorbs the noise as signal.

Why humans alone cannot fix this

The gap between the measurement we have and the measurement we need is not primarily a talent problem. It is a throughput problem.

A skilled measurement practitioner can correct data cleaning rules, update priors from industry knowledge, and source contextual controls. Given enough time. Time is what they do not have. The same practitioner is answering questions from the marketing team, managing the data pipeline, explaining the model to the CFO, and preparing the quarterly presentation. The measurement work that would improve quality most is the work that keeps getting deferred.

This is not a criticism of practitioners. It is a structural observation. Manual-heavy processes do not scale to the depth of rigor that 2026 measurement requires. The gap is not closeable by working harder.

What changes with agentic systems is not the quality ceiling. A human expert still defines what good looks like. What changes is the throughput. An agent that knows your channel-by-channel cleaning rules can apply them every time, without drift, without forgetting, without prioritisation pressure. An agent that has access to a priors library from comparable engagements can start every model warm rather than diffuse. An agent that knows which public data sources to pull can enrich every model update without being asked.

The practitioner's judgment remains essential. The practitioner's time is freed from execution.

The Sorites structure of quality

Here is the uncomfortable version of the argument: measurement quality is a heap. You cannot point to the moment it becomes inadequate. Each individual shortcut seems defensible. Channel-specific cleaning rules are nice to have but the broad rules are usually fine. Diffuse priors are suboptimal but the data will wash them out eventually. Skipping macro controls is a known limitation and the team accounts for it in interpretation.

None of these individual compromises is catastrophic. Together, they constitute a measurement practice that looks like rigour from the outside and behaves like guesswork from the inside.

And the stakes are rising. Media budgets are large. Efficiency differences between a well-specified model and a poorly specified one are measurable in the millions. The CFO is starting to ask harder questions. The marketing team is under pressure to defend every dollar. "The model says so" is no longer sufficient. "The model was trained on properly cleaned data, informed by category priors, controlled for macro factors, and here is the audit trail" is what a 2026 board needs.

That standard requires agentic infrastructure. Not because humans cannot do it, but because humans cannot do it at the speed and depth the standard now requires, on top of everything else they are already doing.

What serious looks like

Level 4 measurement runs continuously, not quarterly. It updates as new data arrives. It flags anomalies before they become surprises. It carries an audit trail that explains every cleaning decision and every modelling choice. It produces outputs the decision-maker can interrogate, not just accept.

Level 5 adds something more: it learns. Not just within an engagement, but across engagements. Each model run contributes to a knowledge base about how media behaves in this category, at this scale, in this macro context. The next run starts from a richer prior. The one after that, richer still.

This is what continuous measurement looks like when the infrastructure catches up to the aspiration. The heap does not disappear. It grows more precise, grain by grain, until the question of where the line is stops mattering, because the model is already operating at the threshold where good decisions are reliably good.

A lab building toward this standard is Acera Labs. The newsletter is the place to follow the work as it develops.

If this framing resonates, the Sift newsletter covers the five levels of measurement maturity in detail. Subscribe below. No pitch, no cadence, just the work.