BACK_TO_FEEDAICRIER_2
Ctxpact tackles 100k-token agent prompts locally
OPEN_SOURCE ↗
REDDIT · REDDIT// 12d agoBENCHMARK RESULT

Ctxpact tackles 100k-token agent prompts locally

Ctxpact is a lightweight proxy that sits between agent frameworks and a local LLM backend, compacting oversized requests before they hit a 16k-context model. It combines structural pruning, rolling summaries, and retrieval/extraction strategies so tools like OpenClaw and Hermes can keep working on Mac Mini-class hardware without cloud APIs or API keys. The post centers on benchmark claims: 110k tokens compressed to 12k while preserving perfect performance on an 8-question Frankenstein comprehension set across three runs, plus stronger results on LoCoMo-MC10 when paired with Qwen3.5 than with LFM2. The project is positioned as open source, OpenAI-compatible, and practical rather than framework-heavy, with the main thesis that model quality and faithful retrieval matter more than ever-more-complex compaction loops.

// ANALYSIS

Hot take: this looks less like a “summarization” project and more like an execution layer for context triage, and the benchmark story is strongest when it admits that backbone model quality dominates everything else.

  • The 3-stage design is sensible: structural pruning first, summary eviction second, retrieval/extraction last. That ordering reduces wasted LLM calls and preserves the highest-value recent turns.
  • The standout claim is not the compression ratio; it’s the faithfulness result. If Qwen3.5 consistently outperforms LFM2 because it follows retrieved context instead of overriding it with parametric knowledge, that is the real engineering insight.
  • The methodology still needs tighter framing. Frankenstein looks like a narrow, potentially overfit suite, so the 8/8 and 0% variance numbers are persuasive but not yet broad evidence of general robustness.
  • The “2 LLM calls is the sweet spot” result is plausible, but I would want ablations controlling for prompt quality, retrieval candidates, and question difficulty before treating it as a universal rule.
  • LoCoMo-MC10 is a better sign of cross-session usefulness than a single reading-comprehension benchmark, but mixing those scores into a combined percentage can obscure the very different failure modes.
  • The most interesting next compaction ideas are probably hybrid, not deeper agent loops: query-aware hierarchical chunking, structured-field preservation for JSON/tool output, and per-task retrieval policies that choose between pruning, summarizing, and exact recall.
// TAGS
local-llmcontext-compactionproxyopenai-compatibleretrievalsummarizationbenchmarkingollamavllmopen-sourcemac-mini

DISCOVERED

12d ago

2026-03-31

PUBLISHED

12d ago

2026-03-31

RELEVANCE

8/ 10

AUTHOR

Honest-Debate-6863