BACK_TO_FEEDAICRIER_2
GenDB paper pitches LLM-built query engines
OPEN_SOURCE ↗
LOBSTERS · LOBSTERS// 31d agoRESEARCH PAPER

GenDB paper pitches LLM-built query engines

Cornell researchers propose GenDB, an agentic query engine that generates per-query execution code instead of relying on a fixed hand-engineered database kernel. In early OLAP tests on TPC-H and a new SEC-EDGAR benchmark, the prototype beats DuckDB, Umbra, ClickHouse, MonetDB, and PostgreSQL by tailoring storage, plans, and native code to the exact workload and hardware.

// ANALYSIS

GenDB is a provocative database paper because it treats the query engine itself as something to synthesize on demand, not a monolith to tune forever.

  • The core bet is that repeated analytical workloads justify high upfront generation cost if the resulting binaries are materially faster on every rerun
  • The paper’s strongest claim is systems-level: LLM agents can co-design storage layout, indexes, operator strategy, and low-level code around cache sizes, cardinality, and join patterns in ways generic engines cannot
  • The biggest catch is correctness and reliability, since GenDB still leans on result comparison against a traditional DBMS or manual review when ground truth is unavailable
  • If the approach matures, it points toward hybrid data stacks where conventional engines handle ad hoc traffic and synthesized executables take over recurring hot queries
  • The open-source repo and roadmap make this more than a thought experiment, with planned extensions into semantic queries, GPU-native generation, and self-improving agent memory
// TAGS
gendbllmagentdata-toolsresearchopen-source

DISCOVERED

31d ago

2026-03-11

PUBLISHED

39d ago

2026-03-03

RELEVANCE

8/ 10