Flue adds vitest-evals agent testing

// 1h agoPRODUCT UPDATE

Flue adds vitest-evals agent testing

Flue, the programmable TypeScript framework for building autonomous AI agents, has introduced support for agent and workflow evaluations by integrating with Sentry's vitest-evals tool. The integration allows developers to create test harnesses that run evaluations in isolated instances, track cost and model usage, support model-based judges, and automate CI/CD checks.

// ANALYSIS

Integrating with Sentry's vitest-evals rather than building a custom evaluation tool is a smart move that leverages existing TypeScript ecosystem strengths. By making evaluations run within standard Vitest suites, Flue ensures developers do not have to learn a new testing framework, flattening the learning curve for testing complex agents.

* Standardized testing: Using vitest-evals and Vitest brings agent testing into the standard JS/TS developer workflow.

* Isolated runs: Initializing fresh agent instances per test case is crucial to prevent state leakage and ensure deterministic testing.

* CI/CD friendly: Exiting with non-zero codes on failed assertions ensures that agents can be continuously tested before deployment.

* Comprehensive tracking: The harness captures not just output correctness, but also cost, tool calls, and model usage, which are key metrics for production agents.

// TAGS

flueagenttestingevaluationtypescriptopen-sourcedevtoolsentryvitest

DISCOVERED

1h ago

2026-06-19

PUBLISHED

1h ago

2026-06-19

RELEVANCE

7/ 10

AUTHOR

FredKSchott

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE20m ago

Higgsfield integrates Grok Imagine Video 1.5

Higgsfield has integrated xAI's new Grok Imagine Video 1.5 model, which features native synchronized audio generation, into its AI video creation platform. This integration allows creators to combine Higgsfield's cinematic camera controls with the high-fidelity video and audio output of xAI's latest model.

UPDATE24m ago

OpenRouter adds default workspaces, stacked budgets

OpenRouter has updated its workspace settings, allowing developers to configure a default workspace and stack multiple inference budgets with varying reset periods. This feature gives teams more granular control over their API expenditures, permitting concurrent limits (e.g., daily safety caps alongside monthly allowances) on a single workspace.

TUTORIAL1h ago

HeyGen hosts live HyperFrames webinar

HeyGen is hosting a live webinar on June 23 to showcase HyperFrames, their open-source video-as-code framework. The session will cover builder workflows and explain how the framework enables developers to programmatically generate deterministic video compositions.