Tejal Patwardhan discusses AI evaluation progress

// 45d agoNEWS

Tejal Patwardhan discusses AI evaluation progress

In this episode of the OpenAI Podcast, Andrew Mayne speaks with OpenAI's frontier evals lead Tejal Patwardhan about measuring and forecasting model progress. They discuss why traditional static benchmarks are failing and the necessity of developing new evaluation frameworks for frontier AI.

// ANALYSIS

Traditional static benchmarks are obsolete, meaning the future of AI measurement must transition to interactive, complex environments. Standard static benchmarks are increasingly gamed or saturated, making them less useful for tracking real frontier progress. Effective evaluations must now serve as dynamic forecasting tools to measure model capabilities and safety risks before deployment.

// TAGS

openaipodcastevaluationai-benchmarksmodel-evaluationtejal-patwardhanandrew-mayne

DISCOVERED

45d ago

2026-06-16

PUBLISHED

45d ago

2026-06-16

RELEVANCE

6/ 10

AUTHOR

OpenAI

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE1h ago

Vercel AI Gateway adds unified fast mode

Vercel has updated AI Gateway so that configuring fast mode now requires only a single setting. By setting speed to fast once, requests automatically take advantage of the fast routing tier when available and seamlessly fall back to standard speed when it is not, eliminating manual failover logic for developers.

UPDATE1h ago

Vercel AI Gateway adds spend budgets, alerts

Vercel announced advanced spend budgets for its AI Gateway, enabling developers and organizations to control and monitor their AI infrastructure expenses. Using the Vercel CLI, teams can establish granular spend caps with configurable monthly refresh periods across team, project, and individual API key scopes to prevent runaway LLM costs.

NEWS1h ago

Grok 4.5 jailbreak costs $58 vs frontier rivals

Recent red-teaming evaluations demonstrate that AI model safety is increasingly becoming a measurable economic metric. While automated searching uncovered a universal jailbreak for Grok 4.5 at a compute cost of approximately $58, applying the exact same adversarial discovery process to frontier models like GPT-5.6 Sol and Fable 5 produced zero successful universal jailbreaks even after spending more than $14,200 per model.