Vision AI stumbles on object counting

// 90d agoBENCHMARK RESULT

Vision AI stumbles on object counting

A Reddit user tested Copilot and Gemini on a dense image-counting task, asking them to count the number of cases in a photo. The thread turned into a reminder that multimodal chatbots can describe images well but still struggle with precise object counting without task-specific tooling.

// ANALYSIS

This is less a shocking AI failure than a useful boundary marker: general-purpose vision-language models are not reliable measurement instruments.

–Dense, overlapping objects remain a weak spot for chat-first multimodal systems
–Prompt correction can improve answers, but it does not guarantee exact counting
–The better engineering answer is segmentation, detection, or classical CV plus verification
–For developers, this is a reminder to wrap LLM vision with purpose-built tools when precision matters

// TAGS

copilotgeminimultimodalllmbenchmarkcomputer-use

DISCOVERED

90d ago

2026-04-22

PUBLISHED

90d ago

2026-04-22

RELEVANCE

5/ 10

AUTHOR

YERAFIREARMS

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE15m ago

Perplexity Computer post-trained orchestrator becomes second most used

Perplexity CEO Aravind Srinivas shared an update regarding model adoption within Perplexity Computer, revealing that a newly integrated post-trained orchestrator model has risen to become the second most utilized central orchestrator on the platform, trailing only Claude Opus 4.8. Srinivas added that once Perplexity secures additional compute capacity, the company plans to increase usage limits through credits and release improved iterations of the post-trained orchestrator.

OPEN SOURCE38m ago

Holo turns MacBook desk surface into interactive tap zones

Holo is an open-source macOS utility that transforms the desk surface surrounding a MacBook into four customizable tap zones using the laptop's built-in microphone. By analyzing acoustic signatures of desk taps locally, Holo allows users to execute macOS Shortcuts, launch applications, or run custom shell scripts without storing persistent audio recordings.

UPDATE1h ago

TrustMRR builds AI agents for micro-acquisitions

Marc Lou announced he is building an AI agent-first alternative for micro-acquisitions that automates the deal discovery and due diligence process. Buyers can specify natural language prompt criteria, such as finding a $10K MRR analytics SaaS, allowing the agent to conduct early due diligence autonomously and alert the buyer only when human intervention is required.