BACK_TO_FEEDAICRIER_2
Local Qwen duo beats vision for web tasks
OPEN_SOURCE ↗
REDDIT · REDDIT// 26d agoBENCHMARK RESULT

Local Qwen duo beats vision for web tasks

A Reddit demo shows a local planner-executor setup (Qwen 8B + Qwen 4B) completing browser shopping flows by replanning one action at a time from compact semantic DOM snapshots instead of screenshots. The reported result is a full cart flow on unfamiliar sites with about 15K total tokens, with modal detection/dismissal cited as a major reliability boost.

// ANALYSIS

Stepwise replanning looks like the practical unlock for small local browser agents, because it trades brittle long-horizon guessing for tight state-feedback loops.

  • Replanning per DOM snapshot reduces cascading failures when real page state diverges from an initial plan.
  • Semantic tables shift the executor into a low-entropy “pick an element ID” task that smaller models can handle.
  • Modal/overlay cleanup is doing hidden heavy lifting and should be treated as a core control loop, not a side heuristic.
  • The token gap versus vision-heavy flows suggests a clear cost/latency advantage for local-first automation stacks.
// TAGS
predicate-sdk-playgroundqwenagentautomationcomputer-useopen-sourceself-hosted

DISCOVERED

26d ago

2026-03-17

PUBLISHED

26d ago

2026-03-17

RELEVANCE

8/ 10

AUTHOR

Aggressive_Bed7113