BACK_TO_FEEDAICRIER_2
Gemma 4 MLX setup hits 19 tok/s
OPEN_SOURCE ↗
REDDIT · REDDIT// 8d agoBENCHMARK RESULT

Gemma 4 MLX setup hits 19 tok/s

A Reddit post describes a minimal local Gemma 4 chat UI built with MLX and Flask for Apple Silicon Macs. The author reports about 19 tokens per second on an M4 MacBook with 16GB RAM and asks whether a 4-bit version can hold up in longer contexts.

// ANALYSIS

Strong niche utility for people who want a no-frills local LLM setup on Apple Silicon, but it reads more like a practical benchmark note than a polished product launch.

  • The main signal is performance: ~19 tok/s on an M4 MacBook with 16GB is credible and useful for local-model shoppers.
  • The setup choice matters: Flask + plain HTML lowers complexity and makes the workflow easier to reproduce than a full desktop app stack.
  • Passing full conversation history each turn is good for narrative work, but it will pressure memory and context efficiency as chats grow.
  • The 4-bit question is the real open item; long-context behavior is where these lightweight local setups usually start to trade quality for speed.
// TAGS
gemmamlxlocal-llmapple-siliconmacbook-m4flaskllm-inferencequantization

DISCOVERED

8d ago

2026-04-04

PUBLISHED

8d ago

2026-04-04

RELEVANCE

7/ 10

AUTHOR

Polstick1971