KV-planner sizes LLM GPU deployments

// 45d agoOPENSOURCE RELEASE

KV-planner sizes LLM GPU deployments

kv-planner is an MIT-licensed capacity planner for LLM inference that estimates GPU memory, latency, throughput, cost, KV cache usage, fleet sizing, speculative decoding, and reasoning-token overhead from physics-based formulas. The project now includes early real-hardware validation against vLLM on H100, A100, RTX 4090, and a MoE workload.

// ANALYSIS

This is a useful infrastructure tool because it attacks a real deployment pain point: the gap between datasheet math and runtime behavior.

–Self-calibration is the strongest feature, since MBU/MFU defaults vary sharply across vLLM, Ollama, llama.cpp, consumer GPUs, and datacenter GPUs.
–The MoE kernel-launch correction is a good reminder that active-parameter math alone misses scheduling overhead in modern inference stacks.
–The validation story is promising but still thin; four GPUs and one MoE coefficient are not enough for production trust across MI300X, L40S, TensorRT-LLM, SGLang, or multi-node setups.
–The broad interface surface, including CLI, GUI, REST, TUI, and MCP, makes it more than a spreadsheet replacement, but the project needs broader benchmark contributions to become a sizing reference.

// TAGS

kv-plannerinferencegpullmopen-sourcedevtoolbenchmarkmcp

DISCOVERED

45d ago

2026-04-21

PUBLISHED

45d ago

2026-04-21

RELEVANCE

8/ 10

AUTHOR

1Hesham

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE28m ago

CopilotKit is an open-source React and Angular frontend stack for building AI-powered copilots, agents, and generative user interfaces.

CopilotKit is an open-source frontend framework that simplifies building and integrating AI copilots and generative user interfaces into React and Angular applications. As the creators of the AG-UI (Agent-User Interaction) protocol, the project offers ready-to-use, plug-and-play components like smart textareas, side panels, and chat systems. It connects frontend applications directly with AI agents and LLMs to support real-time state synchronization, backend-driven UI rendering, and human-in-the-loop workflows.

UPDATE40m ago

Hermes update command requires manual rerun

Nous Research developer Teknium announced that significant modifications have been made to the update command for Hermes, an open-source, autonomous AI agent framework. Due to these deep-level structural changes, some users may need to run the hermes update command again to ensure their local installations are properly configured with the latest system improvements.

INFRA56m ago

Elvis Saravia reverse-engineers Claude Code workflows

AI researcher and DAIR.AI founder Elvis Saravia (@omarsar0) shared that he has reverse-engineered the "Dynamic Workflows" feature from Anthropic's Claude Code to integrate it into his own personal orchestrator. The capability enables agents to write orchestration scripts and spin up coordinated subagent fleets dynamically to support feedback-driven control loops.