Optane PMem build runs 1 trillion parameter LLM locally

// 90d agoINFRASTRUCTURE

Optane PMem build runs 1 trillion parameter LLM locally

A specialized local build featuring 768GB of secondhand Intel Optane Persistent Memory and an RTX 3060 has successfully run the 1.04 trillion parameter Kimi K2.5 model at roughly 5 tokens per second. By leveraging the sparse Mixture-of-Experts architecture and llama.cpp's hybrid offloading, the project achieves frontier-class inference on a hardware budget far below traditional GPU-heavy alternatives.

// ANALYSIS

MoE architectures combined with tiered memory are making 1T+ parameter models viable for hobbyists, effectively bypassing the "VRAM tax" for large-scale reasoning.

–Intel's discontinued PMem modules provide a high-bandwidth, low-latency middle ground between DRAM and SSDs, ideal for sparse expert offloading.
–This build demonstrates that memory capacity, not just FLOPs, is the primary hurdle for local frontier LLM deployment.
–Software optimizations like Unsloth's dynamic quants are essential for fitting 1T models into sub-1TB memory footprints.
–The 5 t/s performance milestone proves that expensive H100 clusters aren't the only way to achieve acceptable inference speeds for research.

// TAGS

llminferencegpuself-hostedintel-optanekimi-k2.5unslothmoe

DISCOVERED

90d ago

2026-04-15

PUBLISHED

90d ago

2026-04-15

RELEVANCE

8/ 10

AUTHOR

APFrisco

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE31m ago

Dotenvx is a secure, next-generation environment variable manager created by the original author of dotenv that allows developers to encrypt and safely commit secrets in source control.

Dotenvx, created by the original developer of the widely used dotenv library, addresses the security and operational challenges of managing application environment variables by enabling encrypted configuration files. It acts as a language-agnostic command-line wrapper that can run on any platform, allowing developers to encrypt secrets directly within their .env files using public-key cryptography and safely commit them to version control while keeping decryption keys separate. By supporting multiple environments and run-time secret injection without code changes, Dotenvx simplifies developer workflows without needing complex external secrets management solutions.

NEWS58m ago

Leaks suggest Claude Opus 5 launches next week

Leaked reports indicate that Anthropic is moving rapidly to replace its temporary compute tiers with a permanent launch of Claude Opus 5 next week, signaling an intensification of the premium LLM competition.

LAUNCH1h ago

Browser Use powers Tomo web automations

Tomo is an iMessage-native personal AI companion and self-improvement platform designed to help users track habits, set goals, and manage tasks through conversational text. To expand its capabilities, Tomo has integrated the browser-use framework to run background browser automations, allowing users to execute web tasks directly via text.