VRAM.cpp runs llama.cpp fit in browser
VRAM.cpp is a browser-based VRAM estimator that runs llama.cpp’s fit logic directly, so users can check whether a specific GGUF will run on their hardware instead of relying on rough calculators. It’s aimed at the exact local-LLM question people keep asking: which quant, on which GPU, with how much host RAM.
This is a smart answer to a real pain point: instead of approximating memory from model size, it reuses the same fitting logic the runtime depends on, which should make estimates much more credible.
- –The core advantage is fidelity: as llama.cpp’s fit algorithm evolves, the estimator inherits those improvements without a separate rules engine to keep in sync.
- –That makes it more useful than generic VRAM calculators for edge cases like Q3 variants, hybrid GPU+RAM fits, and newer model families.
- –The project still admits weak spots in multi-GPU plus host-memory splits and MoE fitting, so the hardest configurations are exactly where users should be most cautious.
- –As an open-source browser app, it lowers the friction for quick pre-flight checks before downloading huge GGUFs or running trial fits locally.
DISCOVERED
45d ago
2026-04-27
PUBLISHED
45d ago
2026-04-27
RELEVANCE
AUTHOR
TheAconn96