LM Studio tops oMLX in M1 Ultra test

// 90d agoBENCHMARK RESULT

LM Studio tops oMLX in M1 Ultra test

A Reddit user benchmarked several local LLM quant setups on an M1 Ultra with 128GB unified memory and found LM Studio running GGUF models faster than oMLX. The result is plausible, but it is not a clean verdict on “best Mac runtime” because GGUF vs MLX, cache behavior, prompt length, and concurrency settings can all swing the outcome.

// ANALYSIS

The hot take: this is less a shocking upset than a reminder that Apple Silicon benchmarks are brutally workload-specific. oMLX is optimized around agent-style cache reuse and batching, while LM Studio can shine on straightforward single-stream GGUF runs. LM Studio supports both llama.cpp and MLX backends, and its MLX support was added specifically for Apple Silicon, but GGUF on llama.cpp can still be very competitive for raw single-user generation speed. oMLX positions itself around paged SSD KV caching, continuous batching, and long-session agent workloads, so its headline advantage is often lower repeated-prefill pain and better concurrent throughput, not always best one-shot tok/s. A 2025 Apple Silicon runtime study on an M2 Ultra found MLX had the highest sustained throughput under its test setup, while llama.cpp stayed highly efficient for lightweight single-stream use; both can be right depending on benchmark design. If the Reddit test mixed model formats or different quant builds, it is not apples-to-apples. MLX and GGUF conversions of the same model can differ in tokenizer handling, kernel maturity, memory pressure, and effective context settings. For developers, the practical takeaway is simple: benchmark TTFT, prompt ingest, steady-state tok/s, and repeat-turn performance separately. The fastest stack for chat UI use is not automatically the fastest stack for coding agents or long-context sessions.

// TAGS

lm-studioomlxllminferencebenchmarkself-hostedgpu

DISCOVERED

90d ago

2026-04-23

PUBLISHED

90d ago

2026-04-23

RELEVANCE

7/ 10

AUTHOR

TheItalianDonkey

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE2h ago

Claude Voice Mode adds Opus, external tools

Anthropic has updated Claude Voice Mode to support the Opus model alongside external tool integrations called connectors. Users can now interact via voice to query emails, modify documents in tools like Notion, and execute voice-driven coding workflows including direct deployments to Vercel.

UPDATE2h ago

llama_cpp_canister Upgrade Delivers 2.8× ICP Speedup

The maintainer of llama_cpp_canister on the Internet Computer Protocol ($ICP) has upgraded to the latest upstream llama.cpp codebase. This live-tested update independently verified a 2.8× performance enhancement for running AI inference on-chain, transitioning speed gains from theoretical research into active deployment.

UPDATE3h ago

Superconductor highlights developer adoption of multi-agent orchestration

Superdot shared an update highlighting growing developer adoption of experimental orchestration features in Superconductor, its native application for agentic engineering. Designed to coordinate multi-agent coding execution with minimal latency, the platform enables developers to build complex automated AI agent workflows.