DS4 lands DeepSeek V4 Flash on Apple Silicon
DS4 is a deliberately narrow local inference engine for DeepSeek V4 Flash on Apple silicon, built around Metal and tuned for the model’s quirks rather than generic GGUF compatibility. Its standout idea is treating SSD storage as part of the KV cache, so long conversations can resume quickly without reprocessing the entire context from scratch.
Hot take: this is more of a systems bet than a model runner, and that makes it interesting.
- –The project is optimized for one model, one hardware family, and one storage hierarchy, which is how it gets away with aggressive context handling.
- –SSD-backed KV cache is the real differentiator here; it reframes disk as a first-class extension of memory for long-context local inference.
- –The Metal-only choice keeps the implementation focused, but also makes the audience very specific: high-end Mac users who want local DeepSeek V4 Flash performance.
- –This is the kind of repo that matters if the model really does behave well under compression and long-context reuse, because the bottleneck shifts from raw model size to memory management.
DISCOVERED
1h ago
2026-05-08
PUBLISHED
1h ago
2026-05-08
RELEVANCE
AUTHOR
Github Awesome