OPEN_SOURCE ↗
REDDIT · REDDIT// 26d agoBENCHMARK RESULT
Local Qwen duo beats vision for web tasks
A Reddit demo shows a local planner-executor setup (Qwen 8B + Qwen 4B) completing browser shopping flows by replanning one action at a time from compact semantic DOM snapshots instead of screenshots. The reported result is a full cart flow on unfamiliar sites with about 15K total tokens, with modal detection/dismissal cited as a major reliability boost.
// ANALYSIS
Stepwise replanning looks like the practical unlock for small local browser agents, because it trades brittle long-horizon guessing for tight state-feedback loops.
- –Replanning per DOM snapshot reduces cascading failures when real page state diverges from an initial plan.
- –Semantic tables shift the executor into a low-entropy “pick an element ID” task that smaller models can handle.
- –Modal/overlay cleanup is doing hidden heavy lifting and should be treated as a core control loop, not a side heuristic.
- –The token gap versus vision-heavy flows suggests a clear cost/latency advantage for local-first automation stacks.
// TAGS
predicate-sdk-playgroundqwenagentautomationcomputer-useopen-sourceself-hosted
DISCOVERED
26d ago
2026-03-17
PUBLISHED
26d ago
2026-03-17
RELEVANCE
8/ 10
AUTHOR
Aggressive_Bed7113