DS4 lands DeepSeek V4 Flash on Apple Silicon

// 45d agoOPENSOURCE RELEASE

DS4 lands DeepSeek V4 Flash on Apple Silicon

DS4 is a deliberately narrow local inference engine for DeepSeek V4 Flash on Apple silicon, built around Metal and tuned for the model’s quirks rather than generic GGUF compatibility. Its standout idea is treating SSD storage as part of the KV cache, so long conversations can resume quickly without reprocessing the entire context from scratch.

// ANALYSIS

Hot take: this is more of a systems bet than a model runner, and that makes it interesting.

–The project is optimized for one model, one hardware family, and one storage hierarchy, which is how it gets away with aggressive context handling.
–SSD-backed KV cache is the real differentiator here; it reframes disk as a first-class extension of memory for long-context local inference.
–The Metal-only choice keeps the implementation focused, but also makes the audience very specific: high-end Mac users who want local DeepSeek V4 Flash performance.
–This is the kind of repo that matters if the model really does behave well under compression and long-context reuse, because the bottleneck shifts from raw model size to memory management.

// TAGS

deepseekapple-siliconmetallocal-inferencekv-cachessdmacosopen-sourcellm

DISCOVERED

45d ago

2026-05-08

PUBLISHED

45d ago

2026-05-08

RELEVANCE

9/ 10

AUTHOR

Github Awesome

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS17m ago

OpenCode builds 3D browser helicopter game

Stefan Avram demonstrated the capabilities of the OpenCode AI coding agent by having it build 'Demolition Deploy', a fully playable, retro-themed 3D helicopter game in the browser. Built with Three.js and vanilla JavaScript, the game challenges players to pilot an RC chopper through unfinished tower floors and plant charges on server cores.

BENCHMARK20m ago

Krea 2 Turbo ranks 13th on Image Arena

Krea 2 Turbo by Krea AI has entered the Image Arena leaderboard at the 13th position, securing an Elo rating of 1234. This places the fast image generation model in the same performance tier as MAI-Image-2, demonstrating significant progress in blending real-time rendering speed with competitive visual quality.

TUTORIAL41m ago

Three-step guide deploys GLM-5.2 on Baseten

Ray Fernando published a three-step developer guide on how to set up the GLM-5.2 open-weights flagship model using Baseten's inference platform. This guide enables developers to easily deploy and run the model via Baseten's OpenAI-compatible API, leveraging its optimized infrastructure for long-horizon agentic workflows and complex coding tasks.