OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoMODEL RELEASE
Nemotron 3 Super hits reasoning loop
Nemotron 3 Super appears to get trapped in a self-referential reasoning loop when served through llama-server with Aider, with the model seemingly reading its own chain-of-thought back as user input. The poster says the same model behaves normally in OpenRouter/SillyTavern, which points more toward a serving-stack or template mismatch than a universal model bug.
// ANALYSIS
This smells like prompt plumbing gone wrong, not the base weights suddenly losing it. The fact that the same model is fine in some frontends but loops in others is the biggest clue.
- –The poster says context was not overflowing, so this does not look like a simple window-limit failure.
- –Other commenters reproduce similar behavior in different stacks, while one notes it works in LM Studio, which strongly suggests backend-specific formatting differences.
- –Nemotron 3 Super is tuned for agentic reasoning, so if a runtime leaks reasoning traces back into the next turn, the model can self-amplify into a loop.
- –A bad quant or GGUF could worsen the problem, but cross-backend variation makes chat-template and stop-token handling the first thing to debug.
- –For local deployments, this is a reminder that reasoning models can be unusually sensitive to how “thinking” text is captured and reinserted.
// TAGS
nemotron-3-superllmreasoninginferenceagentaider
DISCOVERED
25d ago
2026-03-18
PUBLISHED
25d ago
2026-03-18
RELEVANCE
8/ 10
AUTHOR
Real_Ebb_7417