BACK_TO_FEEDAICRIER_2
Pelican Test Expands Into Video
OPEN_SOURCE ↗
REDDIT · REDDIT// 1h agoBENCHMARK RESULT

Pelican Test Expands Into Video

The post proposes a video version of the long-running Pelican Test: give a multimodal model a short clip and ask it to write JavaScript that reproduces the animation as closely as possible. It compares outputs from Gemini 3.1 Pro, K2.5, Qwen 3.6 Plus, and Gemma 4 31B to show how well current VLLMs handle spatial reasoning and visual reconstruction.

// ANALYSIS

This is a decent hacky benchmark because it punishes shallow captioning and rewards actual video understanding plus layout-aware code generation.

  • It shifts the test from static SVG composition to temporal reconstruction, which is harder and more revealing for multimodal models.
  • The real signal here is spatial fidelity: can the model preserve text placement, motion, edits, and transitions without hand-holding?
  • The prompt is still informal and noisy, so it’s better as a vibes benchmark than a rigorous eval suite.
  • Interesting that the author highlights line positioning; that usually exposes whether the model is actually parsing structure or just pattern-matching aesthetics.
  • If this catches on, expect people to use it as a quick litmus test for video-capable models, especially in local/VLLM circles.
// TAGS
pelican-testmultimodalbenchmarkreasoningvideo-genai-coding

DISCOVERED

1h ago

2026-04-17

PUBLISHED

5h ago

2026-04-17

RELEVANCE

8/ 10

AUTHOR

TheRealMasonMac