Sewell benchmarks LLMs on Figma clone
Builder.io founder Steve Sewell tested top AI models on their ability to build a pixel-perfect, production-grade Figma editor clone in a single shot. The models were evaluated inside the Agent-Native repository using the out-of-the-box Pi coding agent as a test harness.
Traditional benchmarks fail to measure how models handle real-world UI design conventions and editor logic. Testing models on their ability to build a functional, pixel-perfect Figma clone under strict repository constraints provides a much-needed reality check for frontend AI agents.
- –**Visual vs. Logic Gap**: Building a Figma clone requires both pixel-perfect canvas rendering and complex state management, exposing models that write neat styling but fail on interactive state.
- –**Out-of-the-Box Limitations**: Using the Pi coding agent without custom prompts or system configurations ensures the benchmark measures raw model capabilities rather than customized engineering workarounds.
- –**Repository Constraints**: Forcing models to adhere to existing conventions inside the Agent-Native repo tests context retrieval and code adaptation, not just code generation.
- –**Evaluation Difficulty**: Rating UI quality and interactive performance remains highly subjective, highlighting the need for automated visual regression testing in AI evaluations.
DISCOVERED
1h ago
2026-06-25
PUBLISHED
16h ago
2026-06-24
RELEVANCE
AUTHOR
Steve8708