MiniMax M2.7 hits 100k on Strix Halo

// 4h agoTUTORIAL

MiniMax M2.7 hits 100k on Strix Halo

This post shares a hard-won local inference setup for pushing MiniMax M2.7 to 100k context on Strix Halo using `llama-server`, along with the exact flags that made it stable: no context shifting, no mmap, unified KV cache, VRAM-only cache, and larger batch sizes for prefill. It also includes deployment notes for headless Fedora and swap/OOM tuning, plus a candid read on the model’s strengths: strong coding intuition and intent-following, but weaker architecture/code-review judgment than Qwen3.6 27B.

// ANALYSIS

Hot take: the real value here is not the benchmark screenshots, it’s the operating playbook for making a long-context open model behave on constrained local hardware.

–The configuration is the core contribution: `--no-context-shift`, `--kv-unified`, `--cache-ram 0`, and `-b/-ub 1024` are the knobs that matter most for stability and throughput.
–The post is useful because it separates what is necessary from what is optional, including the author’s warning that `--cache-reuse 256` can help or hurt depending on workload.
–The hardware angle is narrow but valuable: Strix Halo plus aggressive tuning makes 100k context feel like a reproducible local setup instead of a lab demo.
–The model comparison is nuanced rather than hype-driven: MiniMax is framed as better at “intent” and coding intuition, while Qwen3.6 27B still wins on broader reasoning and review quality.

// TAGS

minimaxm2.7llmlocal-firststrix-halollama-serverquantizationlong-contextfedorainference

DISCOVERED

4h ago

2026-05-10

PUBLISHED

7h ago

2026-05-09

RELEVANCE

7/ 10

AUTHOR

Zc5Gwu

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE1h ago

ARM64 assembly powers static-file Mac server

ymawky is a syscall-only, fork-per-connection static file web server for Apple Silicon macOS, implemented entirely in ARM64 assembly. It supports the common read/write/delete web-server basics plus Range requests for media scrubbing, percent-decoded paths, directory listings, custom error pages, docroot confinement, and some slowloris mitigation. The project is as much a technical feat and learning exercise as it is a practical server, with a detailed implementation writeup linked from the announcement.

OPEN SOURCE1h ago

Terax ships open-source AI terminal with editor

Terax is a lightweight AI terminal and development environment that combines a terminal, built-in editor, AI agents, voice input, and live web preview in a small native app. The project emphasizes speed and portability, claiming a sub-10 MB footprint and roughly 300 ms cold start, while supporting both BYOK workflows and fully local usage via LM Studio. It is free and open source, with releases available across macOS, Linux, and Windows.

TUTORIAL1h ago

OpenAI Engineers Teach Codex Workflows

OpenAI Academy’s Codex workshop walks through setup, prompt framing, steering, and parallel tasking from start to finish. It feels less like a launch and more like a practical field manual for getting reliable work out of the agent.

MiniMax M2.7 hits 100k on Strix Halo