Vercel adds GLM 5.2 Fast via Wafer
Vercel AI Gateway now exclusively hosts GLM 5.2 Fast, leveraging Wafer's optimized inference stack to hit 170+ tokens per second. The integration brings Z.ai's open-weight, 1M-context coding model directly to developers building high-throughput agentic workflows.
Pairing Zhipu's coding-first model with Wafer's inference speed on Vercel makes building responsive AI agents significantly easier.
- –Wafer's optimization pushes throughput to 170-250+ TPS, crucial for real-time coding assistants.
- –A 1-million token context window at these speeds unlocks practical whole-repo reasoning without severe UX degradation.
- –Native Vercel AI SDK integration removes the friction of configuring and managing custom fast-inference infrastructure.
DISCOVERED
2h ago
2026-06-25
PUBLISHED
12h ago
2026-06-24
RELEVANCE
AUTHOR
vercel_dev