Zed ships Zeta2.1, cuts prediction latency
Zeta2.1 is Zed’s updated edit prediction model, using a new Multi-Region prompt format that emits about 3x fewer output tokens. That makes predictions 28% faster at p50, lets Zed run 30% fewer servers for the same traffic, and makes Zeta2.1 the default edit prediction model in Zed.
Hot take: this is less a flashy model breakthrough than a strong systems win, and that is exactly the kind of improvement that compounds in a latency-sensitive feature like edit prediction.
- –The main value is efficiency: fewer output tokens directly reduces inference cost and response time.
- –A 28% faster p50 is meaningful for a keystroke-level product where users feel every millisecond.
- –Running 30% fewer servers for the same traffic is a strong signal that the release improves both UX and infra economics.
- –The “Multi-Region” format suggests the team found a better way to structure predictions, not just a better base model.
- –Keeping Zeta2.1 open-weight preserves the self-host and inspection angle that differentiates Zed’s model story.
DISCOVERED
1h ago
2026-05-11
PUBLISHED
2h ago
2026-05-11
RELEVANCE
AUTHOR
zeddotdev