Ctxpact tackles 100k-token agent prompts locally

// 102d agoBENCHMARK RESULT

Ctxpact tackles 100k-token agent prompts locally

Ctxpact is a lightweight proxy that sits between agent frameworks and a local LLM backend, compacting oversized requests before they hit a 16k-context model. It combines structural pruning, rolling summaries, and retrieval/extraction strategies so tools like OpenClaw and Hermes can keep working on Mac Mini-class hardware without cloud APIs or API keys. The post centers on benchmark claims: 110k tokens compressed to 12k while preserving perfect performance on an 8-question Frankenstein comprehension set across three runs, plus stronger results on LoCoMo-MC10 when paired with Qwen3.5 than with LFM2. The project is positioned as open source, OpenAI-compatible, and practical rather than framework-heavy, with the main thesis that model quality and faithful retrieval matter more than ever-more-complex compaction loops.

// ANALYSIS

Hot take: this looks less like a “summarization” project and more like an execution layer for context triage, and the benchmark story is strongest when it admits that backbone model quality dominates everything else.

–The 3-stage design is sensible: structural pruning first, summary eviction second, retrieval/extraction last. That ordering reduces wasted LLM calls and preserves the highest-value recent turns.
–The standout claim is not the compression ratio; it’s the faithfulness result. If Qwen3.5 consistently outperforms LFM2 because it follows retrieved context instead of overriding it with parametric knowledge, that is the real engineering insight.
–The methodology still needs tighter framing. Frankenstein looks like a narrow, potentially overfit suite, so the 8/8 and 0% variance numbers are persuasive but not yet broad evidence of general robustness.
–The “2 LLM calls is the sweet spot” result is plausible, but I would want ablations controlling for prompt quality, retrieval candidates, and question difficulty before treating it as a universal rule.
–LoCoMo-MC10 is a better sign of cross-session usefulness than a single reading-comprehension benchmark, but mixing those scores into a combined percentage can obscure the very different failure modes.
–The most interesting next compaction ideas are probably hybrid, not deeper agent loops: query-aware hierarchical chunking, structured-field preservation for JSON/tool output, and per-task retrieval policies that choose between pruning, summarizing, and exact recall.

// TAGS

local-llmcontext-compactionproxyopenai-compatibleretrievalsummarizationbenchmarkingollamavllmopen-sourcemac-mini

DISCOVERED

102d ago

2026-03-31

PUBLISHED

102d ago

2026-03-31

RELEVANCE

8/ 10

AUTHOR

Honest-Debate-6863

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL1h ago

Qwythos-9B v2 fixes LLM repetition loops

Empero AI has launched the v2 hygiene release of Qwythos-9B, an open-source, 9-billion parameter reasoning model built on an uncensored Qwen3.5 base. This update addresses common local LLM repetition and tool-calling issues by employing Final-Token Preference Optimization to eliminate decoding loops under greedy settings and restoring the native multi-token prediction head.

OPEN SOURCE3h ago

meshoptimizer is an open-source C/C++ library that optimizes 3D triangle meshes to reduce file sizes and accelerate GPU rendering performance.

meshoptimizer is a high-performance C/C++ library designed to optimize 3D meshes for faster rendering and smaller file sizes. Developed by Arseny Kapoulkine, it provides a comprehensive suite of algorithms for vertex cache optimization, vertex fetch optimization, overdraw reduction, mesh simplification (Level of Detail), and data compression. The project includes gltfpack, an opinionated tool for optimizing glTF scenes, along with WebAssembly and JavaScript bindings for web applications, making it a staple in graphics pipelines and game engines.

UPDATE4h ago

Abacus AI integrates Supercomputer with agentic workflows

Abacus AI has integrated its Supercomputer with agentic workflows in Max Mode, giving LLMs like Fable 5 root access to a persistent Linux environment to execute, debug, and host full-stack applications autonomously.