Google's Gemma 4 12B model faces criticism for slow local performance and poor multimodal generation on Apple Silicon.
A developer tested Google's Gemma 4 12B model on a MacBook Pro M5 Max with 128GB of unified memory, finding its performance disappointing for a model of its size. Achieving only 44 tokens per second, the developer noted that the model's generation quality was subpar—specifically criticizing a produced lava lamp image—and recommended developers stick with Qwen 3.6 27B or 35B instead.
Google's new encoder-free multimodal architecture may face performance penalties when running locally on consumer hardware.
* A speed of 44 tokens/second on an M5 Max with 128GB unified memory is slow for a 12B parameter model, failing to meet developer expectations for local execution.
* The poor quality of the generated lava lamp suggests that the model's multimodal capabilities are not yet competitive with other options.
* The recommendation of Qwen 3.6 27B or 35B highlights Qwen's strong position in the open-weights space, particularly for local deployment.
DISCOVERED
1h ago
2026-06-05
PUBLISHED
1h ago
2026-06-05
RELEVANCE
AUTHOR
bridgemindai
