BACK_TO_FEEDAICRIER_2
PRISM targets O(1) KV block selection with photonics
OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoRESEARCH PAPER

PRISM targets O(1) KV block selection with photonics

PRISM is a research repo and paper proposing O(1) photonic block selection for long-context LLM inference, replacing the usual O(N) KV-cache signature scan with an optical broadcast-and-weight core on TFLN. The project also ships a GPU-only selector, simulator, and benchmark code, with the README claiming 944x faster selection and 18,000x lower energy than an H100 scan at 1M context, plus a modeled 5.3x total-decode win at 100M context.

// ANALYSIS

Hot take: this is the rare hardware idea that actually matches the workload shape, because broadcast-to-many is what photonics does best, but the eye-catching gains still depend on simulation assumptions rather than a fabricated chip.

  • The repo is real and usable today: MIT-licensed code, a paper PDF, a demo, a simulator, and a GPU-only `BlockSelector` for current LLM stacks.
  • It attacks a real bottleneck: block-sparse methods like Quest or RocketKV still scan all candidate block signatures from HBM every decode step, so latency rises with context even when fetches are sparse.
  • The scaling story is compelling: the README models 5.3x faster total decode at 100M context in batch serving, which is where HBM bandwidth pain gets brutal.
  • The GPU-only selector already claims 100% needle retrieval and 0% LongBench-v2 drop, so there is a practical software fallback even before any chip exists. Sources: https://github.com/hyoseokp/PRISM ; https://www.reddit.com/r/LocalLLaMA/comments/1s1f8sq/designed-a-photonic-chip-for-o1-kv-cache-block/
// TAGS
prismllminferencebenchmarkresearchopen-sourcegpu

DISCOVERED

19d ago

2026-03-23

PUBLISHED

19d ago

2026-03-23

RELEVANCE

8/ 10

AUTHOR

Exact-Schedule-3442