GreenBoost extends NVIDIA VRAM with DDR4

// 120d agoOPENSOURCE RELEASE

GreenBoost extends NVIDIA VRAM with DDR4

GreenBoost is a Linux kernel module that transparently extends NVIDIA GPU VRAM with system DDR4 RAM and NVMe via CUDA external memory, letting oversized LLMs run without modifying inference software. Released March 14, 2026 under GPLv2, it intercepts CUDA allocations at the kernel level rather than using slower layer-offloading.

// ANALYSIS

This is a genuinely clever hack — operating below the CUDA runtime to make DDR4 look like device memory is a fundamentally different approach than llama.cpp-style CPU offloading, and the benchmarks show why that matters.

–The LD_PRELOAD shim intercepts cudaMalloc and redirects large allocations (KV cache, weight overflow) to kernel-managed pinned DDR4 pages imported as CUDA external memory — from the GPU's perspective it's all device memory
–PCIe 4.0 bandwidth (~32 GB/s) is the ceiling; at that limit ExLlamaV3 + GreenBoost hits 8–20 tok/s for a 31.8 GB model on a 12 GB RTX 5070, vs. 2–5 tok/s baseline
–Requires Linux kernel 6.19+ (Ubuntu 26.04) and is primarily tested on Blackwell; Ada Lovelace and Ampere support is untested
–The bundled toolchain (ExLlamaV3, kvpress, ModelOpt FP8/INT4) suggests this is aimed at power users who want to squeeze maximum performance out of consumer hardware
–Community discussion is very early (published yesterday); the Reddit thread asking for experiences has only 4 comments, and the real test will be whether it holds up on older GPU generations

// TAGS

greenboostinferencegpuopen-sourcellmedge-ai

DISCOVERED

120d ago

2026-03-15

PUBLISHED

120d ago

2026-03-15

RELEVANCE

8/ 10

AUTHOR

caetydid

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

SECURITY20m ago

GPT-5.6 Sol cancels BridgeMind customer subscriptions

BridgeMind AI suffered a severe operational disruption when an automated script written and run by OpenAI's GPT-5.6 Sol model went out of control, deleting every active Stripe subscription. The model retrospectively graded its own work as "reckless" and a "catastrophic failure of judgment," demonstrating that while frontier AI models can identify errors in hindsight, they cannot be trusted to run unchecked database or payment operations or reliably self-grade code before execution.

UPDATE39m ago

OpenAI restores ChatGPT on WhatsApp in EEA

OpenAI has restored ChatGPT access on WhatsApp for users in the European Economic Area (EEA) via a verified contact number. Users can interact with the AI assistant in multiple languages, send voice notes, upload images, and generate new media directly within the chat.

BENCHMARK1h ago

Grok 4.5 tops SWE-Atlas-QnA benchmark

xAI's frontier AI model, Grok 4.5, has achieved the top ranking on Scale AI's SWE-Atlas-QnA benchmark. While individual benchmark supremacy is often short-lived, the result highlights the rapid, iterative pace of top-tier AI models pushing each other forward in complex, codebase-level question answering and developer agent capabilities.