OPEN_SOURCE ↗
REDDIT · REDDIT// 7h agoNEWS
Hermes Agent Users Hit Q2 Wall
A LocalLLaMA user on a 3090 says Hermes Agent paired with Qwen3.5-35B-A3B Q2_K feels brittle for chat, research, and agent work, and asks for a better local baseline. The thread is really a reminder that model choice, quant level, and serving stack matter as much as the agent wrapper.
// ANALYSIS
The hot take is that this reads less like a Hermes problem than a "the model was quantized too hard" problem.
- –Qwen3.5-35B-A3B Q2_K sits in the "very low quality but surprisingly usable" tier, so weak output is expected rather than surprising.
- –On a 24GB RTX 3090, a higher-quality setup like Qwen3.5-27B at Q4_K_M or a better 35B quant will usually feel much more coherent than squeezing for minimum VRAM.
- –Hermes Agent adds orchestration, memory, and tools, but it cannot recover reasoning quality the base model no longer has.
- –For mixed chat, research, and agentic work, consistency usually beats raw parameter count; a dense mid-size model can feel better than a heavily compressed MoE.
// TAGS
hermes-agentllmagentself-hostedopen-sourceautomationgpu
DISCOVERED
7h ago
2026-04-17
PUBLISHED
8h ago
2026-04-17
RELEVANCE
8/ 10
AUTHOR
mburnside