Dirac tops TerminalBench on Gemini 3 Flash

// 90d agoBENCHMARK RESULT

Dirac tops TerminalBench on Gemini 3 Flash

Dirac, an open-source coding agent, claims a 65.2% score on TerminalBench 2.0 using Gemini-3-flash-preview. That edges Google’s official 47.8% and Junie CLI’s 64.3%, with the author saying the run used the fully open-source repo and no cheating mechanisms.

// ANALYSIS

Dirac’s result is a reminder that benchmark outcomes are often as much about harness quality as model choice. If the run holds up, it strengthens the case that context curation, edit precision, and tool orchestration can swing agent performance materially.

–The reported 65.2% TerminalBench 2.0 score would put an open-source agent ahead of both Google’s own submission and the current closed-source leader cited in the post.
–The author explicitly says no `agents/skills.md` files were inserted, no resource or timeout changes were made, and the exact GitHub codebase was used for the run.
–Dirac’s positioning around hash-anchored edits, AST-aware manipulation, and token efficiency fits the kind of workflow TerminalBench is meant to stress.
–The post also highlights a real benchmark problem: if the community doubts compliance, the score matters less than the reproducibility story around it.
–Until the leaderboard accepts the submission, this reads as a strong but still provisional signal that agent scaffolding can be a competitive advantage.

// TAGS

diraccliopen-sourceai-codingagentbenchmark

DISCOVERED

90d ago

2026-04-27

PUBLISHED

90d ago

2026-04-27

RELEVANCE

9/ 10

AUTHOR

GodelNumbering

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE36m ago

OpenCode 1.18.6 fixes MCP refresh and branch caches

OpenCode version 1.18.6 introduces key stability fixes and performance improvements across its desktop application and underlying client interfaces. This update resolves provider and Model Context Protocol (MCP) refresh issues in App v1, stabilizes v2 client compatibility by pinning the UI to a versioned `@opencode-ai/client` snapshot, and isolates remote reference caches by git branch to prevent cross-branch state collisions.

OPEN SOURCE1h ago

ESP32 AI Runs 28.9M Model at 9.5 Tokens/Sec

ESP32 AI is an architectural experiment by slvDev that runs a 28.9-million-parameter TinyStories language model locally on an $8 ESP32-S3 microchip without relying on external cloud servers. By keeping a 25-million-parameter embedding table in memory-mapped SPI flash to fetch token rows on demand, the project successfully circumvents tight microcontroller RAM limitations while maintaining a generation throughput of approximately 9.5 tokens per second.

OPEN SOURCE1h ago

Open Science v0.7.2 boosts research workflow transparency

AIPOCH has released Open Science v0.7.2, an update to its open-source, model-agnostic AI workbench for scientific discovery. The new release prioritizes making AI research workflows more transparent, controllable, and easier to manage as researchers increasingly rely on autonomous agents for complex scientific tasks.