OPEN_SOURCE ↗
REDDIT · REDDIT// 10h agoBENCHMARK RESULT
SigMap TF-IDF hits 80% top-5
SigMap reports that signature-only TF-IDF retrieval across function and class surfaces reached 80% hit@5 on 90 tasks from 18 repos, while cutting context by 98.1% on average. The result argues that for some code-search workflows, identifiers and shapes carry enough signal to delay or skip embeddings entirely.
// ANALYSIS
This is a strong narrow benchmark result, not proof that embeddings are obsolete. It does show that for offline local-model context compression, a cheap first-pass ranker can get much farther than many teams expect.
- –Function signatures and class shapes are unusually information-dense, so exact lexical matching has a real advantage over semantic paraphrase in codebases
- –The 98.1% token reduction is the practical headline: it makes local-model workflows cheaper and more repeatable before any vector stack is introduced
- –The likely ceiling is multi-hop and semantic queries, where naming alone stops being enough and call graphs or rerankers become necessary
- –The benchmark probably rewards well-named repos; generic helpers, deeply abstracted code, and cross-file flows will be the stress cases
- –For teams building lightweight code retrieval, this is a good case for “TF-IDF first, embeddings later” instead of starting with heavyweight RAG
// TAGS
sigmapsearchembeddingai-codingopen-source
DISCOVERED
10h ago
2026-04-17
PUBLISHED
11h ago
2026-04-17
RELEVANCE
8/ 10
AUTHOR
Independent-Flow3408