Ctxpact tackles 100k-token agent prompts locally
Ctxpact is a lightweight proxy that sits between agent frameworks and a local LLM backend, compacting oversized requests before they hit a 16k-context model. It combines structural pruning, rolling summaries, and retrieval/extraction strategies so tools like OpenClaw and Hermes can keep working on Mac Mini-class hardware without cloud APIs or API keys. The post centers on benchmark claims: 110k tokens compressed to 12k while preserving perfect performance on an 8-question Frankenstein comprehension set across three runs, plus stronger results on LoCoMo-MC10 when paired with Qwen3.5 than with LFM2. The project is positioned as open source, OpenAI-compatible, and practical rather than framework-heavy, with the main thesis that model quality and faithful retrieval matter more than ever-more-complex compaction loops.
Hot take: this looks less like a “summarization” project and more like an execution layer for context triage, and the benchmark story is strongest when it admits that backbone model quality dominates everything else.
- –The 3-stage design is sensible: structural pruning first, summary eviction second, retrieval/extraction last. That ordering reduces wasted LLM calls and preserves the highest-value recent turns.
- –The standout claim is not the compression ratio; it’s the faithfulness result. If Qwen3.5 consistently outperforms LFM2 because it follows retrieved context instead of overriding it with parametric knowledge, that is the real engineering insight.
- –The methodology still needs tighter framing. Frankenstein looks like a narrow, potentially overfit suite, so the 8/8 and 0% variance numbers are persuasive but not yet broad evidence of general robustness.
- –The “2 LLM calls is the sweet spot” result is plausible, but I would want ablations controlling for prompt quality, retrieval candidates, and question difficulty before treating it as a universal rule.
- –LoCoMo-MC10 is a better sign of cross-session usefulness than a single reading-comprehension benchmark, but mixing those scores into a combined percentage can obscure the very different failure modes.
- –The most interesting next compaction ideas are probably hybrid, not deeper agent loops: query-aware hierarchical chunking, structured-field preservation for JSON/tool output, and per-task retrieval policies that choose between pruning, summarizing, and exact recall.
DISCOVERED
12d ago
2026-03-31
PUBLISHED
12d ago
2026-03-31
RELEVANCE
AUTHOR
Honest-Debate-6863