BACK_TO_FEEDAICRIER_2
GreenBoost extends NVIDIA VRAM with DDR4
OPEN_SOURCE ↗
REDDIT · REDDIT// 28d agoOPENSOURCE RELEASE

GreenBoost extends NVIDIA VRAM with DDR4

GreenBoost is a Linux kernel module that transparently extends NVIDIA GPU VRAM with system DDR4 RAM and NVMe via CUDA external memory, letting oversized LLMs run without modifying inference software. Released March 14, 2026 under GPLv2, it intercepts CUDA allocations at the kernel level rather than using slower layer-offloading.

// ANALYSIS

This is a genuinely clever hack — operating below the CUDA runtime to make DDR4 look like device memory is a fundamentally different approach than llama.cpp-style CPU offloading, and the benchmarks show why that matters.

  • The LD_PRELOAD shim intercepts cudaMalloc and redirects large allocations (KV cache, weight overflow) to kernel-managed pinned DDR4 pages imported as CUDA external memory — from the GPU's perspective it's all device memory
  • PCIe 4.0 bandwidth (~32 GB/s) is the ceiling; at that limit ExLlamaV3 + GreenBoost hits 8–20 tok/s for a 31.8 GB model on a 12 GB RTX 5070, vs. 2–5 tok/s baseline
  • Requires Linux kernel 6.19+ (Ubuntu 26.04) and is primarily tested on Blackwell; Ada Lovelace and Ampere support is untested
  • The bundled toolchain (ExLlamaV3, kvpress, ModelOpt FP8/INT4) suggests this is aimed at power users who want to squeeze maximum performance out of consumer hardware
  • Community discussion is very early (published yesterday); the Reddit thread asking for experiences has only 4 comments, and the real test will be whether it holds up on older GPU generations
// TAGS
greenboostinferencegpuopen-sourcellmedge-ai

DISCOVERED

28d ago

2026-03-15

PUBLISHED

28d ago

2026-03-15

RELEVANCE

8/ 10

AUTHOR

caetydid