GitHub Copilot Boosts VS Code Token Efficiency
The GitHub Copilot team has introduced key harness-level optimizations in VS Code to reduce token consumption by up to 18% and lower latency for agentic workflows. These updates include extended prompt caching, deferred tool schema loading, client-side embedding-based tool search, and persistent WebSockets.
The shift toward usage-based billing makes developer client optimizations like local embedding-guided tool search just as crucial as the underlying foundation model improvements.
- –Extended Caching: Enabling 24-hour prompt cache retention prevents cold-start latency and reduces costs after user breaks.
- –Deferring Tools: Marking tools with `defer_loading` keeps large JSON parameter schemas out of the context window until the model explicitly requests them.
- –Persistent WebSockets: Replacing repeated HTTP connections with WebSockets dramatically improves latency across multi-turn agent sessions.
- –Client-Side Embedding Search: Offloading tool search to local embeddings allows intent-based matching and dynamic MCP tool discovery without server roundtrips.
- –Specialized Subagents: Delegation of tasks like workspace search or summarizing to cheaper, specialized models reduces the main agent's context overhead.
DISCOVERED
2h ago
2026-06-17
PUBLISHED
2h ago
2026-06-17
RELEVANCE
AUTHOR
code