BACK_TO_FEEDAICRIER_2
oMLX trails LM Studio on tokens/s
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT

oMLX trails LM Studio on tokens/s

A Reddit user reports the same mlx-community Qwen3.6-35B 4-bit model running at about 49 tok/s in LM Studio versus 38 tok/s in oMLX on an M3 Pro. The post asks whether the gap comes from runtime optimization, cache behavior, or hidden config differences.

// ANALYSIS

The likely answer is that this is a runtime-and-defaults problem, not a model problem. LM Studio’s MLX stack is mature and aggressively optimized for steady-state generation, while oMLX is optimized more for serving, persistence, and agent workflows than pure single-request tok/s.

  • oMLX’s own positioning emphasizes continuous batching, tiered SSD KV caching, and multi-model serving, which are useful for long sessions but can add overhead in a simple throughput test
  • LM Studio has a newer MLX engine path and a lot of polish around Apple Silicon performance, so small differences in backend settings can easily show up as double-digit token/sec gaps
  • “Same model” does not guarantee a fair comparison if context length, cache state, batch size, prompt format, or sampling defaults differ
  • For max Mac speed, the real benchmark is a clean apples-to-apples run with identical runtime, context, and decoding settings, not just the model file name
  • If the workload is coding agents or long-lived local serving, oMLX’s cache persistence may still be the better tradeoff even if raw decode speed is lower
// TAGS
llminferencebenchmarkopen-sourceomlxlm-studio

DISCOVERED

4h ago

2026-04-19

PUBLISHED

6h ago

2026-04-19

RELEVANCE

7/ 10

AUTHOR

mouseofcatofschrodi