oMLX trails LM Studio on tokens/s

// 90d agoBENCHMARK RESULT

oMLX trails LM Studio on tokens/s

A Reddit user reports the same mlx-community Qwen3.6-35B 4-bit model running at about 49 tok/s in LM Studio versus 38 tok/s in oMLX on an M3 Pro. The post asks whether the gap comes from runtime optimization, cache behavior, or hidden config differences.

// ANALYSIS

The likely answer is that this is a runtime-and-defaults problem, not a model problem. LM Studio’s MLX stack is mature and aggressively optimized for steady-state generation, while oMLX is optimized more for serving, persistence, and agent workflows than pure single-request tok/s.

–oMLX’s own positioning emphasizes continuous batching, tiered SSD KV caching, and multi-model serving, which are useful for long sessions but can add overhead in a simple throughput test
–LM Studio has a newer MLX engine path and a lot of polish around Apple Silicon performance, so small differences in backend settings can easily show up as double-digit token/sec gaps
–“Same model” does not guarantee a fair comparison if context length, cache state, batch size, prompt format, or sampling defaults differ
–For max Mac speed, the real benchmark is a clean apples-to-apples run with identical runtime, context, and decoding settings, not just the model file name
–If the workload is coding agents or long-lived local serving, oMLX’s cache persistence may still be the better tradeoff even if raw decode speed is lower

// TAGS

llminferencebenchmarkopen-sourceomlxlm-studio

DISCOVERED

90d ago

2026-04-19

PUBLISHED

90d ago

2026-04-19

RELEVANCE

7/ 10

AUTHOR

mouseofcatofschrodi

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

POLICY1h ago

US weighs FINRA-style AI safety regulator

The Trump administration is evaluating a proposal to create an independent, industry-funded AI safety regulator modeled after the Financial Industry Regulatory Authority (FINRA). The self-regulatory watchdog would establish standardized deployment guidelines for frontier models under SEC oversight to replace ad-hoc federal interventions.

NEWS1h ago

Atlas deployment sparks Hyundai labor strikes

Hyundai Motor Company workers in South Korea have staged partial strikes in protest of plans to deploy 25,000 Boston Dynamics Atlas humanoid robots in manufacturing plants. The union demands formal labor-management agreements and job guarantees before any robotic automation is integrated into their workflows.

NEWS1h ago

EngineAI T800 humanoid decapitated in tournament

EngineAI stress-tested its T800 humanoid robot platform under combat conditions in Shenzhen's Ultimate Robot Knockout Legend (URKL) tournament. The MMA-style bout highlighted hardware limits when one of the general-purpose robots was decapitated.