OpenCode truncates Qwen3-Coder at 36k tokens
A developer reports that agentic coding tool OpenCode forcefully compacts context for Qwen3-Coder-Next at 36k tokens. This occurs despite local llama.cpp backends confirming support for the model's full 200k context window.
Local agentic coding stacks still struggle to reliably pass massive context windows from inference engines to application layers.
- –While models boast 200k contexts, middleware tooling often imposes hidden limits or struggles with memory management.
- –The discrepancy between llama.cpp's backend reporting and OpenCode's frontend behavior highlights fragmentation in local AI toolchains.
- –Running massive contexts on a 16GB VRAM and 128GB RAM setup requires aggressive offloading, potentially triggering unhandled compaction in the agent logic.
DISCOVERED
56d ago
2026-04-01
PUBLISHED
56d ago
2026-04-01
RELEVANCE
AUTHOR
soyalemujica