YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

AST graphs, BM25 cut code RAG tokens

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

AST graphs, BM25 cut code RAG tokens
OPEN LINK ↗
// 45d agoINFRASTRUCTURE

AST graphs, BM25 cut code RAG tokens

The post describes a code retrieval pipeline that parses repositories with Tree-sitter, turns symbols and dependencies into a typed graph, and uses BM25 over node metadata plus edge traversal to keep LLM context around 5K tokens. It is a practical, lexical-first alternative to chunk-and-embed retrieval for codebases, but the benchmark claim is still anecdotal.

// ANALYSIS

The idea is solid because code search is usually about exact identifiers, signatures, and dependency edges more than fuzzy semantic similarity. The weak spot is that BM25-only retrieval can miss intent-heavy queries, so the real question is whether this beats a hybrid stack once you measure recall and answer quality.

  • Tree-sitter plus typed nodes gives you structural recall that plain chunk embeddings routinely miss in codebases.
  • BM25 is a strong default for symbol-heavy queries like function names, file paths, imports, and type references.
  • The 5K-vs-100K token story is compelling, but it needs hard numbers: recall@k, task success rate, latency, and cost versus hybrid vector plus reranker baselines.
  • Unweighted edges are likely the first bottleneck; call, import, and inheritance links should not all expand context equally.
  • Static graph extraction will work best in typed or convention-heavy codebases; dynamic languages, reflection, and runtime dispatch will be the stress cases.
// TAGS
ragsearchembeddingdata-toolsast-derived-graph-rag

DISCOVERED

45d ago

2026-04-30

PUBLISHED

45d ago

2026-04-30

RELEVANCE

8/ 10

AUTHOR

Altruistic_Night_327