OPEN_SOURCE ↗
REDDIT · REDDIT// 8d agoBENCHMARK RESULT
Gemma 4 MLX setup hits 19 tok/s
A Reddit post describes a minimal local Gemma 4 chat UI built with MLX and Flask for Apple Silicon Macs. The author reports about 19 tokens per second on an M4 MacBook with 16GB RAM and asks whether a 4-bit version can hold up in longer contexts.
// ANALYSIS
Strong niche utility for people who want a no-frills local LLM setup on Apple Silicon, but it reads more like a practical benchmark note than a polished product launch.
- –The main signal is performance: ~19 tok/s on an M4 MacBook with 16GB is credible and useful for local-model shoppers.
- –The setup choice matters: Flask + plain HTML lowers complexity and makes the workflow easier to reproduce than a full desktop app stack.
- –Passing full conversation history each turn is good for narrative work, but it will pressure memory and context efficiency as chats grow.
- –The 4-bit question is the real open item; long-context behavior is where these lightweight local setups usually start to trade quality for speed.
// TAGS
gemmamlxlocal-llmapple-siliconmacbook-m4flaskllm-inferencequantization
DISCOVERED
8d ago
2026-04-04
PUBLISHED
8d ago
2026-04-04
RELEVANCE
7/ 10
AUTHOR
Polstick1971