GitHub Copilot Boosts VS Code Token Efficiency

// 45d agoPRODUCT UPDATE

GitHub Copilot Boosts VS Code Token Efficiency

The GitHub Copilot team has introduced key harness-level optimizations in VS Code to reduce token consumption by up to 18% and lower latency for agentic workflows. These updates include extended prompt caching, deferred tool schema loading, client-side embedding-based tool search, and persistent WebSockets.

// ANALYSIS

The shift toward usage-based billing makes developer client optimizations like local embedding-guided tool search just as crucial as the underlying foundation model improvements.

–Extended Caching: Enabling 24-hour prompt cache retention prevents cold-start latency and reduces costs after user breaks.
–Deferring Tools: Marking tools with `defer_loading` keeps large JSON parameter schemas out of the context window until the model explicitly requests them.
–Persistent WebSockets: Replacing repeated HTTP connections with WebSockets dramatically improves latency across multi-turn agent sessions.
–Client-Side Embedding Search: Offloading tool search to local embeddings allows intent-based matching and dynamic MCP tool discovery without server roundtrips.
–Specialized Subagents: Delegation of tasks like workspace search or summarizing to cheaper, specialized models reduces the main agent's context overhead.

// TAGS

github-copilottoken-efficiencyprompt-cachingwebsocketsmcpagentvs-codeai-coding

DISCOVERED

45d ago

2026-06-17

PUBLISHED

45d ago

2026-06-17

RELEVANCE

8/ 10

AUTHOR

code

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL25m ago

DeepSeek-V4-Flash-High excels at low-cost frontend coding

AI researcher Elvis Saravia (@omarsar0) highlighted the impressive front-end development capabilities of DeepSeek-V4-Flash-High during recent testing. He noted that the model's output quality was high enough to prompt a double-check of which model was actively being used, praising its performance-to-price ratio.

TUTORIAL55m ago

DAIR.AI offers harness engineering, evals training

DAIR.AI emphasizes harness engineering and model evaluations as essential skills for building production-grade AI applications. The platform is releasing educational resources and courses focused on evaluation harnesses and systematic testing.

TUTORIAL1h ago

Dual Blackwell GPUs run 167 GB DeepSeek-V4 FP8

A developer shared a deployment recipe for running the official FP8 version of DeepSeek-V4-Flash-0731 alongside DSpark speculative decoding on a dual NVIDIA RTX PRO 6000 Blackwell (SM120) GPU rig. Requiring approximately 167 GB of VRAM, the model fits cleanly across the system's combined 192 GB VRAM capacity (2× 96 GB) without offloading or truncation.