OPEN_SOURCE ↗
LOBSTERS · LOBSTERS// 31d agoRESEARCH PAPER
GenDB paper pitches LLM-built query engines
Cornell researchers propose GenDB, an agentic query engine that generates per-query execution code instead of relying on a fixed hand-engineered database kernel. In early OLAP tests on TPC-H and a new SEC-EDGAR benchmark, the prototype beats DuckDB, Umbra, ClickHouse, MonetDB, and PostgreSQL by tailoring storage, plans, and native code to the exact workload and hardware.
// ANALYSIS
GenDB is a provocative database paper because it treats the query engine itself as something to synthesize on demand, not a monolith to tune forever.
- –The core bet is that repeated analytical workloads justify high upfront generation cost if the resulting binaries are materially faster on every rerun
- –The paper’s strongest claim is systems-level: LLM agents can co-design storage layout, indexes, operator strategy, and low-level code around cache sizes, cardinality, and join patterns in ways generic engines cannot
- –The biggest catch is correctness and reliability, since GenDB still leans on result comparison against a traditional DBMS or manual review when ground truth is unavailable
- –If the approach matures, it points toward hybrid data stacks where conventional engines handle ad hoc traffic and synthesized executables take over recurring hot queries
- –The open-source repo and roadmap make this more than a thought experiment, with planned extensions into semantic queries, GPU-native generation, and self-improving agent memory
// TAGS
gendbllmagentdata-toolsresearchopen-source
DISCOVERED
31d ago
2026-03-11
PUBLISHED
39d ago
2026-03-03
RELEVANCE
8/ 10