OPEN_SOURCE ↗
REDDIT · REDDIT// 24d agoOPENSOURCE RELEASE
afm 0.9.7 adds Telegram, batch decoding
afm 0.9.7 upgrades the Swift-based macOS local inference stack with concurrent batch decoding, Telegram chat access, grammar-constrained tool calls, and radix-tree prefix caching. It runs Apple Foundation Models or MLX models through an OpenAI-compatible API with no Python runtime.
// ANALYSIS
This looks less like a wrapper refresh and more like a serious local inference runtime for Apple Silicon Macs. The most interesting part is that the release is focused on throughput and reliability, not just model support.
- –Concurrent batch decoding and shared prefix caching should improve real-world throughput, especially for repeated prompts and multi-request workloads
- –XGrammar constraints for tool calls are a practical fix for brittle XML/JSON formatting issues on smaller or less compliant models
- –Telegram bridging makes the local model reachable from anywhere without exposing the machine directly to the public internet
- –The Swift-only stack plus Homebrew/PyPI install path keeps the barrier low for Mac developers who want local, OpenAI-compatible inference
- –This is still a niche, Mac-only layer, but it is a compelling one for Apple Silicon users who want private local AI with stronger API ergonomics
// TAGS
afmopen-sourceinferenceapiclillm
DISCOVERED
24d ago
2026-03-18
PUBLISHED
24d ago
2026-03-18
RELEVANCE
8/ 10
AUTHOR
scousi