Qwen Code Burns 32K Tokens

// 90d agoTUTORIAL

Qwen Code Burns 32K Tokens

A LocalLLaMA user describes trying to run Qwen Code on modest hardware with local Qwen3.5 models, then adding RAG and MCP to cut context size. A simple `git status` still shows a 32K-token session, exposing how much hidden agent scaffolding these coding tools carry.

// ANALYSIS

This looks less like a broken prompt strategy and more like a classic agent-overhead problem: Qwen Code is designed to ship with a heavy hidden prompt, tool schemas, repo state, and conversation memory, so even a trivial request can inherit a large context footprint. The `32,162` input-token figure is misleadingly scary because `31,806` tokens were served from cache; the fresh payload was tiny. MCP and RAG only save tokens if they replace broad context, not if they add big tool manifests, long histories, or oversized retrieved chunks. On local inference, long-context agent workflows can become latency-bound before generation even starts, especially with 27B+ models on consumer hardware. The biggest gains usually come from prompt-budget discipline: fewer tools loaded at once, shorter system instructions, aggressive state summarization, and on-demand retrieval only. If the goal is a usable local coding agent on 32GB RAM, the host workflow probably needs more pruning before jumping from 9B to 27B or 35B.

// TAGS

qwen-codecliagentmcpragai-codinginference

DISCOVERED

90d ago

2026-04-18

PUBLISHED

90d ago

2026-04-18

RELEVANCE

8/ 10

AUTHOR

eur0child

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

BENCHMARK18m ago

Runway Agent 2.0 tops Arc 1.0 benchmark

Runway detailed its engineering approach for Runway Agent 2.0, a conversational video generation and editing partner that topped Physion Labs' Arc 1.0 benchmark across all categories. The platform integrates media into a timeline interface, letting users iteratively transform briefs or performance data into cinematic video.

MODEL1h ago

Moonshot AI shares Kimi K3 pre-launch look

Ahead of the launch of their Kimi K3 large language model, the team at Chinese AI startup Moonshot AI shared a behind-the-scenes photo of their workspace. The post captures the excitement and high stakes surrounding the release, with team members expressing confidence that their office is a potential birthplace of Artificial General Intelligence (AGI).

NEWS1h ago

Claude Code praised as multi-model orchestrator

A user on X has highlighted Anthropic's Claude Code as the premier agentic harness for orchestrating other models and harnesses, specifically mentioning running GPT-5.6 Sol and Kimi K3. Although the user notes that Claude Code does not win in terms of pure coding performance and efficiency, they find its workflow management and coordination capabilities to be highly valuable for modern developer environments.