BACK_TO_FEEDAICRIER_2
afm 0.9.7 adds Telegram, batch decoding
OPEN_SOURCE ↗
REDDIT · REDDIT// 24d agoOPENSOURCE RELEASE

afm 0.9.7 adds Telegram, batch decoding

afm 0.9.7 upgrades the Swift-based macOS local inference stack with concurrent batch decoding, Telegram chat access, grammar-constrained tool calls, and radix-tree prefix caching. It runs Apple Foundation Models or MLX models through an OpenAI-compatible API with no Python runtime.

// ANALYSIS

This looks less like a wrapper refresh and more like a serious local inference runtime for Apple Silicon Macs. The most interesting part is that the release is focused on throughput and reliability, not just model support.

  • Concurrent batch decoding and shared prefix caching should improve real-world throughput, especially for repeated prompts and multi-request workloads
  • XGrammar constraints for tool calls are a practical fix for brittle XML/JSON formatting issues on smaller or less compliant models
  • Telegram bridging makes the local model reachable from anywhere without exposing the machine directly to the public internet
  • The Swift-only stack plus Homebrew/PyPI install path keeps the barrier low for Mac developers who want local, OpenAI-compatible inference
  • This is still a niche, Mac-only layer, but it is a compelling one for Apple Silicon users who want private local AI with stronger API ergonomics
// TAGS
afmopen-sourceinferenceapiclillm

DISCOVERED

24d ago

2026-03-18

PUBLISHED

24d ago

2026-03-18

RELEVANCE

8/ 10

AUTHOR

scousi