Resurf ships reproducible browser-agent testbed

// 2h agoOPENSOURCE RELEASE

Resurf ships reproducible browser-agent testbed

Resurf is a deterministic, open-source test framework for AI browser agents built around synthetic sites, failure injection, and auditable success checks. It aims to replace flaky live-web evals and judge-only scoring with something teams can actually reproduce.

// ANALYSIS

This is the right kind of boring infrastructure: browser-agent evals need controlled environments more than they need another flashy benchmark.

–`shop_v1` gives a realistic commerce flow with auth, checkout, returns, and ambiguous UI, so agents get tested on multi-step behavior instead of toy pages.
–Failure-mode injection for latency, payment declines, 3DS, 5xxs, and session expiry is the main differentiator; that is how you measure recovery, not just happy-path navigation.
–DB-state predicates are a cleaner success signal than LLM-based judging, which should make regressions easier to reproduce and debug.
–Support for `browser-use`, `stagehand`, and a vision-only baseline makes it useful for teams already experimenting with browser agents.

// TAGS

resurfevaluationtestingframeworkagentweb-agentopen-source

DISCOVERED

2h ago

2026-05-07

PUBLISHED

2h ago

2026-05-07

RELEVANCE

8/ 10

AUTHOR

Visual-Librarian6601

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS27m ago

ElevenHacks, Cursor push voice-only challenge

ElevenLabs’ eighth weekly hackathon asks builders to use Cursor and ElevenLabs voice APIs to create something fully operable without a keyboard. The contest is live now, with cash prizes, social-post scoring, and a free month of ElevenLabs Creator for participants.

UPDATE59m ago

Entire CLI adds recap, review, labs

Entire shipped three new CLI commands aimed at making agent-driven development easier to inspect and control. `entire recap` summarizes recent agent activity across repos and time windows, `entire review` runs configured review skills against the current branch with checkpoint context, and `entire labs` surfaces experimental workflows while the interface is still evolving. The update pushes Entire further toward a workflow layer for tracking, reviewing, and replaying AI coding sessions instead of treating agent output as a one-off diff.

OPEN SOURCE1h ago

OpenCode plugin fades radio as agents work

opencode-vibe-mode is an OpenCode TUI plugin that adds background music while you work, keeping audio at a lower cruise volume and automatically fading it up when an agent is active. It includes built-in stations like house, lofi, and jazz, plus commands to toggle the vibe, restart playback, and cycle stations. Setup is lightweight but opinionated: it depends on `mpv` and `yt-dlp`, and can be installed directly from GitHub or wired in locally or globally.