BACK_TO_FEEDAICRIER_2
Hermes Agent Users Hit Q2 Wall
OPEN_SOURCE ↗
REDDIT · REDDIT// 7h agoNEWS

Hermes Agent Users Hit Q2 Wall

A LocalLLaMA user on a 3090 says Hermes Agent paired with Qwen3.5-35B-A3B Q2_K feels brittle for chat, research, and agent work, and asks for a better local baseline. The thread is really a reminder that model choice, quant level, and serving stack matter as much as the agent wrapper.

// ANALYSIS

The hot take is that this reads less like a Hermes problem than a "the model was quantized too hard" problem.

  • Qwen3.5-35B-A3B Q2_K sits in the "very low quality but surprisingly usable" tier, so weak output is expected rather than surprising.
  • On a 24GB RTX 3090, a higher-quality setup like Qwen3.5-27B at Q4_K_M or a better 35B quant will usually feel much more coherent than squeezing for minimum VRAM.
  • Hermes Agent adds orchestration, memory, and tools, but it cannot recover reasoning quality the base model no longer has.
  • For mixed chat, research, and agentic work, consistency usually beats raw parameter count; a dense mid-size model can feel better than a heavily compressed MoE.
// TAGS
hermes-agentllmagentself-hostedopen-sourceautomationgpu

DISCOVERED

7h ago

2026-04-17

PUBLISHED

8h ago

2026-04-17

RELEVANCE

8/ 10

AUTHOR

mburnside