AI's Drag-n-Drop Problem
Automation of Front End AI generation has an unsolved/unsolvable problem
Between Mousedown and Mouseup
Ask anyone running an agentic coding team in 2026 what's actually shippable end-to-end, and you'll hear a lot of qualifiers. The agents can scaffold a backend, wire up auth, write tests, refactor mercilessly. But somewhere on the way to a finished product, the pace slows and almost always at the same place: the front-end. Not the markup, not the styles, not even most of the interaction logic. The problem is motion. An AI agent can read a DOM tree all day. It can take a screenshot and tell you what's on the screen. It can drive a browser, clicking and typing its way through your app. What it can't do is watch. And the moment you ask it to debug a drag-and-drop, where the entire bug lives in the half-second between mousedown and mouseup, you discover just how much of front-end work is actually a visual feedback loop the agent isn't part of.
Snapshots, Not Motion
A screenshot is a frame. The DOM is a frame. The accessibility tree is a frame. Every channel an agent has to perceive a running app is, by construction, point-in-time. Ask it to look at your drag-and-drop bug, and what it does is grab a state (before the drag, after the drag, maybe a few in between) and reason about the differences. That reasoning is fine if your bug is "the wrong item ended up in the wrong place." But that's not where drag bugs live. Drag bugs live in motion: a ghost preview that drifts off-cursor, a drop indicator that lags one slot behind, an easing curve that looks fine in the dev tools and feels broken in your hand, a list that reorders visually but doesn't update state or updates state but doesn't reorder visually. None of that survives the trip to a still frame. You can take a hundred frames and the bug will hide between any two of them, because the bug is the trip between them.
A human developer watching the same drag clocks it in one pass. They don't even think. Their visual system has done the diff before their forebrain catches up. The agent, looking at the same app, has nothing equivalent. It has stills, and stills don't move.