afm 0.9.7 adds Telegram, batch decoding

// 116d agoOPENSOURCE RELEASE

afm 0.9.7 adds Telegram, batch decoding

afm 0.9.7 upgrades the Swift-based macOS local inference stack with concurrent batch decoding, Telegram chat access, grammar-constrained tool calls, and radix-tree prefix caching. It runs Apple Foundation Models or MLX models through an OpenAI-compatible API with no Python runtime.

// ANALYSIS

This looks less like a wrapper refresh and more like a serious local inference runtime for Apple Silicon Macs. The most interesting part is that the release is focused on throughput and reliability, not just model support.

–Concurrent batch decoding and shared prefix caching should improve real-world throughput, especially for repeated prompts and multi-request workloads
–XGrammar constraints for tool calls are a practical fix for brittle XML/JSON formatting issues on smaller or less compliant models
–Telegram bridging makes the local model reachable from anywhere without exposing the machine directly to the public internet
–The Swift-only stack plus Homebrew/PyPI install path keeps the barrier low for Mac developers who want local, OpenAI-compatible inference
–This is still a niche, Mac-only layer, but it is a compelling one for Apple Silicon users who want private local AI with stronger API ergonomics

// TAGS

afmopen-sourceinferenceapiclillm

DISCOVERED

116d ago

2026-03-18

PUBLISHED

116d ago

2026-03-18

RELEVANCE

8/ 10

AUTHOR

scousi

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE29m ago

Grok Build adds multiline input, scrolling

SpaceXAI has released Grok Build versions 0.2.99 and 0.2.98, introducing multiline input and terminal scrolling for its terminal-based AI coding assistant. The updates allow users to input complex prompts directly on the dashboard and scroll through chat histories using PageUp and PageDown.

INFRA1h ago

GLM-5 runs natively on Ascend via FlagOS

Zhipu AI's GLM-5 has been packaged for native execution on Huawei Ascend NPUs using the FlagOS framework, representing the first CUDA-free deployment of a Chinese general-purpose LLM on domestic hardware. This integration satisfies local sovereignty requirements across hardware, model, and inference runtime in a single package.

INFRA1h ago

Alchemy enables declarative agentic infrastructure

Sam Goodwin shared a declarative workflow for constructing agentic infrastructure using Alchemy, combining English prompts and TypeScript code in a single TypeScript file. By utilizing string template literals and a simple alchemy deploy command, developers can deploy applications directly to the cloud without manual environment setup.