Claude Opus 4.7 hits inconsistency on MineBench

// 90d agoBENCHMARK RESULT

Claude Opus 4.7 hits inconsistency on MineBench

New benchmark results for Claude Opus 4.7 on MineBench reveal a regression in consistency and spatial logic compared to Opus 4.6. Despite higher scores on standard text benchmarks, the model's 3D voxel construction capabilities show a preference for scenery over structural precision, raising questions about its creative reasoning.

// ANALYSIS

Claude Opus 4.7’s performance on MineBench serves as a reminder that state-of-the-art benchmark scores do not always translate to real-world creative utility. With an average inference time of 43 minutes and costs nearing $275 per build, the model is significantly more expensive and slower than its predecessor. Its tendency to prioritize scenery over core build prompts suggests an attention shift possibly tied to its new adaptive thinking mode. While the model may be optimized for academic evaluations, it struggles with the tool-heavy, multi-step logic required for complex voxel art.

// TAGS

claude-opus-4-7benchmarkllmspatial-intelligencereasoning

DISCOVERED

90d ago

2026-04-18

PUBLISHED

90d ago

2026-04-17

RELEVANCE

8/ 10

AUTHOR

ENT_Alam

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE38m ago

B.AI adds Kimi K3 3T-class model to API

B.AI has rapidly integrated Moonshot AI's newly released Kimi K3 model into its API platform. This update provides developers with immediate access to what is described as the world's first open 3T-class AI model, enabling them to leverage its advanced computational capabilities without the overhead of hosting it themselves.

LAUNCH1h ago

Roblox launches Build mobile AI game creator

Roblox is launching Build, a mobile-first AI tool within its app that generates basic, playable games from text prompts. The tool shares a backend with Roblox Studio, allowing creators to start projects on mobile and refine them on desktop.

UPDATE1h ago

TanStack AI ships client-side message queueing

TanStack AI has introduced client-side message queuing within its useChat hook to manage concurrent prompt submissions and prevent race conditions during active LLM streams. The update supports FIFO, batch, and interrupt queuing strategies to automatically transmit messages once the stream settles.