YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

PersonaPlex 7B leaks memory on Apple Silicon

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

PersonaPlex 7B leaks memory on Apple Silicon
OPEN LINK ↗
// 50d agoMODEL RELEASE

PersonaPlex 7B leaks memory on Apple Silicon

A Reddit user reports that PersonaPlex 7B can run in real time on an M5 Max in MLX, but the full-duplex Apple Silicon path quickly exhausts unified memory and crashes inside MLX’s arena allocator when the KV cache grows via repeated concatenation. A preallocated cache avoids the leak but makes inference too slow for the real-time target, and the official NVIDIA server does not appear to offer a practical MPS route. The post frames the core question as whether this is a solvable MLX implementation issue or a sign that the model really needs CUDA-class hardware for stable full-duplex use.

// ANALYSIS

The reported failure looks more like an MLX runtime issue than a model-quality problem, with repeated concat in the KV cache likely driving the memory growth. The tradeoff is stark: the fast concat path appears to hit the memory cliff, while the safer preallocated cache is too slow for the reported real-time target. The post also suggests the Apple Silicon audio stack is still immature, since full-duplex output quality is described as poor even before the crash. If MLX’s execution model is the root cause, periodic flushes probably will not fix it; the cache layout or kernel path likely needs to change.

// TAGS
personaplexnvidiamoshimlxapple-siliconspeech-to-speechfull-duplexunified-memoryreal-time-voice

DISCOVERED

50d ago

2026-04-07

PUBLISHED

50d ago

2026-04-07

RELEVANCE

9/ 10

AUTHOR

Excellent_Koala769