OPEN_SOURCE ↗
REDDIT · REDDIT// 1h agoBENCHMARK RESULT
Pelican Test Expands Into Video
The post proposes a video version of the long-running Pelican Test: give a multimodal model a short clip and ask it to write JavaScript that reproduces the animation as closely as possible. It compares outputs from Gemini 3.1 Pro, K2.5, Qwen 3.6 Plus, and Gemma 4 31B to show how well current VLLMs handle spatial reasoning and visual reconstruction.
// ANALYSIS
This is a decent hacky benchmark because it punishes shallow captioning and rewards actual video understanding plus layout-aware code generation.
- –It shifts the test from static SVG composition to temporal reconstruction, which is harder and more revealing for multimodal models.
- –The real signal here is spatial fidelity: can the model preserve text placement, motion, edits, and transitions without hand-holding?
- –The prompt is still informal and noisy, so it’s better as a vibes benchmark than a rigorous eval suite.
- –Interesting that the author highlights line positioning; that usually exposes whether the model is actually parsing structure or just pattern-matching aesthetics.
- –If this catches on, expect people to use it as a quick litmus test for video-capable models, especially in local/VLLM circles.
// TAGS
pelican-testmultimodalbenchmarkreasoningvideo-genai-coding
DISCOVERED
1h ago
2026-04-17
PUBLISHED
5h ago
2026-04-17
RELEVANCE
8/ 10
AUTHOR
TheRealMasonMac