PRISM targets O(1) KV block selection with photonics
PRISM is a research repo and paper proposing O(1) photonic block selection for long-context LLM inference, replacing the usual O(N) KV-cache signature scan with an optical broadcast-and-weight core on TFLN. The project also ships a GPU-only selector, simulator, and benchmark code, with the README claiming 944x faster selection and 18,000x lower energy than an H100 scan at 1M context, plus a modeled 5.3x total-decode win at 100M context.
Hot take: this is the rare hardware idea that actually matches the workload shape, because broadcast-to-many is what photonics does best, but the eye-catching gains still depend on simulation assumptions rather than a fabricated chip.
- –The repo is real and usable today: MIT-licensed code, a paper PDF, a demo, a simulator, and a GPU-only `BlockSelector` for current LLM stacks.
- –It attacks a real bottleneck: block-sparse methods like Quest or RocketKV still scan all candidate block signatures from HBM every decode step, so latency rises with context even when fetches are sparse.
- –The scaling story is compelling: the README models 5.3x faster total decode at 100M context in batch serving, which is where HBM bandwidth pain gets brutal.
- –The GPU-only selector already claims 100% needle retrieval and 0% LongBench-v2 drop, so there is a practical software fallback even before any chip exists. Sources: https://github.com/hyoseokp/PRISM ; https://www.reddit.com/r/LocalLLaMA/comments/1s1f8sq/designed-a-photonic-chip-for-o1-kv-cache-block/
DISCOVERED
19d ago
2026-03-23
PUBLISHED
19d ago
2026-03-23
RELEVANCE
AUTHOR
Exact-Schedule-3442