Pelican Test Expands Into Video

// 90d agoBENCHMARK RESULT

Pelican Test Expands Into Video

The post proposes a video version of the long-running Pelican Test: give a multimodal model a short clip and ask it to write JavaScript that reproduces the animation as closely as possible. It compares outputs from Gemini 3.1 Pro, K2.5, Qwen 3.6 Plus, and Gemma 4 31B to show how well current VLLMs handle spatial reasoning and visual reconstruction.

// ANALYSIS

This is a decent hacky benchmark because it punishes shallow captioning and rewards actual video understanding plus layout-aware code generation.

–It shifts the test from static SVG composition to temporal reconstruction, which is harder and more revealing for multimodal models.
–The real signal here is spatial fidelity: can the model preserve text placement, motion, edits, and transitions without hand-holding?
–The prompt is still informal and noisy, so it’s better as a vibes benchmark than a rigorous eval suite.
–Interesting that the author highlights line positioning; that usually exposes whether the model is actually parsing structure or just pattern-matching aesthetics.
–If this catches on, expect people to use it as a quick litmus test for video-capable models, especially in local/VLLM circles.

// TAGS

pelican-testmultimodalbenchmarkreasoningvideo-genai-coding

DISCOVERED

90d ago

2026-04-17

PUBLISHED

90d ago

2026-04-17

RELEVANCE

8/ 10

AUTHOR

TheRealMasonMac

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

LAUNCH21m ago

LM Studio launches Bionic local-first AI agent

LM Studio has launched Bionic, a local-first AI productivity and coding agent for Mac and Windows featuring real-time local voice transcription. The agent natively runs local models using MLX and llama.cpp, but also supports cloud-hosted open weights with Zero Data Retention enabled.

BENCHMARK25m ago

Kimi K3 tops Frontend Code Arena

Moonshot AI's Kimi K3 model has taken the top position on the Frontend Code Arena leaderboard, outperforming Claude Fable 5 and GPT-5.6 Sol. The 2.8-trillion-parameter model won six out of seven sub-domains on the leaderboard, with only gaming remaining led by Claude.

LAUNCH28m ago

StackBlitz launches Bolt Slides sandals giveaway

StackBlitz has launched Bolt Slides, an interactive browser-based presentation builder, celebrating the release with a giveaway of 200 custom physical sandals. Winners selected from X replies are encouraged to share photos of themselves using the tool while wearing the footwear.