
Pragma Tests Tool-Calling Reliability Floor
Pragma is a local-first autonomous agent built on llama.cpp with separate code-generation and orchestration models. The post argues that small loop models fail first on tool-call discipline, and that exact tool signatures plus repetition watchdogs helped push the floor lower.
Strong systems post. The useful insight here is that orchestration is a different problem from code generation, and the failure mode is not “can it think?” but “can it stay inside the tool contract?”
- –The post is grounded in a practical local stack: llama.cpp, open-source models, and a visible reasoning loop.
- –The core claim is credible and specific: smaller models often fail on argument discipline before they fail on reasoning.
- –The proposed mitigations are directionally right, especially exact signatures in-prompt and tighter loop controls.
- –The repo angle makes this more than a rant; it reads like an early design note for a local agent harness.
- –Best follow-up for the ecosystem would be stricter schemas/grammar-constrained decoding and evaluation by failure class, not just overall task success.
DISCOVERED
4h ago
2026-05-23
PUBLISHED
15h ago
2026-05-22
RELEVANCE
AUTHOR
HomoAgens1
