llama.cpp users flag Metal slowdown on Intel Macs

// 91d agoINFRASTRUCTURE

llama.cpp users flag Metal slowdown on Intel Macs

A Reddit post in r/LocalLLaMA highlights extremely poor `llama.cpp` inference performance on an Intel Mac Pro with an RX 580 under Metal, with the user reporting under 1 token per second and less than 2% GPU utilization. The complaint stands out because the same hardware reportedly reaches 20+ tokens per second under Vulkan on Linux and Windows, turning the thread into a sharp reminder that Apple’s older Intel-era GPU path remains a weak spot for local LLM inference.

// ANALYSIS

This is less a product update than a reality check on backend fragmentation: `llama.cpp` is broad and fast, but not every hardware/software combo gets first-class treatment.

–The `llama.cpp` project explicitly positions Apple Silicon as a first-class target, which helps explain why Intel Mac + AMD GPU setups can feel like second-tier citizens.
–The gap between Metal on macOS and Vulkan on Linux/Windows suggests the bottleneck is backend maturity and driver behavior, not just raw GPU capability.
–MoltenVK shows why developers keep chasing a cross-platform graphics stack, but shader failures also show how messy that portability story still is on Macs.
–Even with aggressive offload flags like `-ngl 999`, backend limitations can dominate performance, so tuning alone may not rescue older Intel Mac configurations.

// TAGS

llama-cppllminferencegpuopen-source

DISCOVERED

91d ago

2026-03-10

PUBLISHED

95d ago

2026-03-07

RELEVANCE

6/ 10

AUTHOR

FreQRiDeR

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS10m ago

Claude Fable 5 tops 5.5 in data analysis

In a recent post on X, user Theo expressed intense enthusiasm about the data analysis capabilities of an AI model called Fable. By stating it is "WAY better than 5.5," the user implies a significant generational leap in performance over what is likely a major foundational model, suggesting Fable is exceptionally well-suited for complex data tasks.

MODEL42m ago

Claude Fable 5 launch sparks massive developer backlash

Anthropic's Claude Fable 5 launch faces severe developer backlash over aggressive safety restrictions, high pricing, and a forced 30-day data retention policy. The model silently routes chemistry, biology, and cybersecurity requests to the older Opus 4.8 model, frustrating users with opaque downgrades and anti-distillation blocks.

MODEL42m ago

Designers praise Claude Fable 5 landing pages

Educator and designer Meng To highlighted Claude Fable 5's capability for creating landing pages on X, calling the model "a monster" for the task. Released in June 2026, Claude Fable 5 is Anthropic's latest Mythos-class AI model, featuring a 1-million-token context window, a 128,000-token output capacity, and advanced reasoning for long-horizon agentic workflows, making it highly effective for complex design and front-end code generation tasks.