Developer builds fast local Pi agent on macOS

// 45d agoTUTORIAL

Developer builds fast local Pi agent on macOS

This tutorial details building a fast, offline coding agent stack on Apple Silicon using llama.cpp, Gemma 4, and MTP speculative decoding. Connecting these components to the Pi open-source terminal agent achieves up to 72 tokens per second with multimodal support.

// ANALYSIS

Speculative decoding (MTP) and macOS-specific optimizations in llama.cpp prove to be highly effective for local LLM performance, even beating MLX on this machine.

–llama.cpp with Metal acceleration unexpectedly outperformed MLX-LM for this specific Gemma 4 26B setup on an M1 Max.
–MTP draft models offered a significant 24% boost in text generation speed without hurting prompt processing time.
–Adding a multimodal projector successfully enables image input for Pi without incurring any text-generation slowdowns.
–While Gemma 4 is used as the primary example, the author notes that Qwen3.6 35B is a much stronger coding model, albeit with a slight performance penalty (55 tokens/second vs 72 tokens/second).

// TAGS

coding-agentlocal-firstmacosllama.cppgemma-4qwen3.6piai-codingagent

DISCOVERED

45d ago

2026-06-12

PUBLISHED

45d ago

2026-06-12

RELEVANCE

8/ 10

AUTHOR

kkm

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

LAUNCH2h ago

Focusa launches mission control runtime for AI agents

Focusa (@focusa_dev) is an AI agent mission-control layer and Workpoint workflow runtime built by Verious Smith III to solve context loss and session failures in multi-step AI tasks. Unlike basic chat interfaces, Focusa maintains persistent session state, trajectory, evidence, and decisions across long-running agent workflows and model switches, providing AI operators with a durable, dependable environment for real-world automation.

UPDATE2h ago

Augment integrates Moonshot AI's Kimi K3 into Cosmos

Augment announced the integration of Moonshot AI's Kimi K3 open-source model into Cosmos, its agent orchestration platform. Highlighted by Augment as the most capable open-source model they have tested to date, Kimi K3 is now available within Cosmos to power developer agent workflows and multi-agent coordination.

UPDATE2h ago

Open Science v0.7.3 enhances long-running research workflows

AIPOCH has announced the release of Open Science version 0.7.3, an update focused on enabling complex and long-running AI research workflows. As AI agents move beyond short experiments toward extended research tasks, this release equips the workbench to handle larger scientific files, manage longer context demands, and provide a smoother workspace environment.