Flue adds vitest-evals agent testing
Flue, the programmable TypeScript framework for building autonomous AI agents, has introduced support for agent and workflow evaluations by integrating with Sentry's vitest-evals tool. The integration allows developers to create test harnesses that run evaluations in isolated instances, track cost and model usage, support model-based judges, and automate CI/CD checks.
Integrating with Sentry's vitest-evals rather than building a custom evaluation tool is a smart move that leverages existing TypeScript ecosystem strengths. By making evaluations run within standard Vitest suites, Flue ensures developers do not have to learn a new testing framework, flattening the learning curve for testing complex agents.
* Standardized testing: Using vitest-evals and Vitest brings agent testing into the standard JS/TS developer workflow.
* Isolated runs: Initializing fresh agent instances per test case is crucial to prevent state leakage and ensure deterministic testing.
* CI/CD friendly: Exiting with non-zero codes on failed assertions ensures that agents can be continuously tested before deployment.
* Comprehensive tracking: The harness captures not just output correctness, but also cost, tool calls, and model usage, which are key metrics for production agents.
DISCOVERED
1h ago
2026-06-19
PUBLISHED
1h ago
2026-06-19
RELEVANCE
AUTHOR
FredKSchott