PersonaPlex 7B leaks memory on Apple Silicon

// 95d agoMODEL RELEASE

PersonaPlex 7B leaks memory on Apple Silicon

A Reddit user reports that PersonaPlex 7B can run in real time on an M5 Max in MLX, but the full-duplex Apple Silicon path quickly exhausts unified memory and crashes inside MLX’s arena allocator when the KV cache grows via repeated concatenation. A preallocated cache avoids the leak but makes inference too slow for the real-time target, and the official NVIDIA server does not appear to offer a practical MPS route. The post frames the core question as whether this is a solvable MLX implementation issue or a sign that the model really needs CUDA-class hardware for stable full-duplex use.

// ANALYSIS

The reported failure looks more like an MLX runtime issue than a model-quality problem, with repeated concat in the KV cache likely driving the memory growth. The tradeoff is stark: the fast concat path appears to hit the memory cliff, while the safer preallocated cache is too slow for the reported real-time target. The post also suggests the Apple Silicon audio stack is still immature, since full-duplex output quality is described as poor even before the crash. If MLX’s execution model is the root cause, periodic flushes probably will not fix it; the cache layout or kernel path likely needs to change.

// TAGS

personaplexnvidiamoshimlxapple-siliconspeech-to-speechfull-duplexunified-memoryreal-time-voice

DISCOVERED

95d ago

2026-04-07

PUBLISHED

96d ago

2026-04-07

RELEVANCE

9/ 10

AUTHOR

Excellent_Koala769

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

INFRA34m ago

Ritual builds infrastructure for autonomous AI agents

Ritual is an AI lab and infrastructure project that aims to move beyond simply making AI models smarter by focusing on granting them autonomous agency. The project is developing the underlying stack—including cryptography, consensus, and privacy mechanisms—required for AI agents to operate persistently, hold and spend their own money, and execute tasks without needing manual human approval for every action.

OPEN SOURCE1h ago

OpenDisplay turns iOS devices into Mac monitors

OpenDisplay is an open-source utility that streams macOS desktops to iPads or iPhones over USB or Wi-Fi, turning them into low-latency, high-resolution external monitors. Leveraging macOS's private CGVirtualDisplay API, ScreenCaptureKit, and VideoToolbox, it integrates directly into macOS Display settings as a true extended display without needing external servers or telemetry.

OPEN SOURCE1h ago

NASA releases SpaceWasm flight WebAssembly interpreter

spacewasm is a WebAssembly interpreter developed by NASA and Caltech for safety-critical flight software. Written in Rust, it decodes Wasm modules in a single pass into an optimized intermediate representation and utilizes a custom memory model with fixed-size allocation pages to guarantee deterministic execution and avoid memory panics in resource-constrained embedded systems.