BACK_TO_FEEDAICRIER_2
Chonkify v1.0 beats LLMLingua2 by 175%
OPEN_SOURCE ↗
REDDIT · REDDIT// 21d agoOPENSOURCE RELEASE

Chonkify v1.0 beats LLMLingua2 by 175%

chonkify is an extractive document-compression tool for RAG and agent memory that aims to preserve facts, structure, and reasoning while cutting tokens. The release ships compiled wheels and claims strong benchmark wins over Microsoft's LLMLingua family on multidocument tests.

// ANALYSIS

This looks like a genuinely interesting niche release if the numbers hold, but the claims are still based on a small internal suite with proxy recovery metrics, so independent replication matters.

  • The selection core scores passages by information density and diversity, then keeps the highest-value subset under a token budget.
  • The repo’s benchmark summary spans 5 documents and two budgets, with a mean +68.57% gain over LLMLingua and +174.90% over LLMLingua2 on composite recovery.
  • The benchmark caveat matters: the scorer is proxy-based, so these results are best treated as directional evidence, not ground truth.
  • Packaged wheels, a CLI, and a Python API make it more deployable than many research-heavy compressors, especially for RAG pipelines.
  • Support for Azure OpenAI, OpenAI-compatible endpoints, and local SentenceTransformers gives teams a practical cloud-or-offline path.
// TAGS
chonkifyragagentllmbenchmarkopen-sourcecliembedding

DISCOVERED

21d ago

2026-03-21

PUBLISHED

21d ago

2026-03-21

RELEVANCE

8/ 10

AUTHOR

thomheinrich