REDDIT · REDDIT// 3h agoINFRASTRUCTURE

VRAM.cpp runs llama.cpp fit in browser

VRAM.cpp is a browser-based VRAM estimator that runs llama.cpp’s fit logic directly, so users can check whether a specific GGUF will run on their hardware instead of relying on rough calculators. It’s aimed at the exact local-LLM question people keep asking: which quant, on which GPU, with how much host RAM.

// ANALYSIS

This is a smart answer to a real pain point: instead of approximating memory from model size, it reuses the same fitting logic the runtime depends on, which should make estimates much more credible.

–The core advantage is fidelity: as llama.cpp’s fit algorithm evolves, the estimator inherits those improvements without a separate rules engine to keep in sync.
–That makes it more useful than generic VRAM calculators for edge cases like Q3 variants, hybrid GPU+RAM fits, and newer model families.
–The project still admits weak spots in multi-GPU plus host-memory splits and MoE fitting, so the hardest configurations are exactly where users should be most cautious.
–As an open-source browser app, it lowers the friction for quick pre-flight checks before downloading huge GGUFs or running trial fits locally.

// TAGS

vram-cppllama-cppllmgpuopen-sourcedevtoolinference

DISCOVERED

3h ago

2026-04-27

PUBLISHED

7h ago

2026-04-27

RELEVANCE

8/ 10

AUTHOR

TheAconn96