MiniMax M2.7 quant trims to 74GB

// 1h agoMODEL RELEASE

MiniMax M2.7 quant trims to 74GB

JANGQ-AI’s MiniMax-M2.7-JANGTQ_K repackages MiniMax M2.7 into a mixed-bit MLX quant for local Apple Silicon use. The card puts the bundle at about 74 GB on disk, with more bits spent on sensitive down-projection weights and heavier compression elsewhere.

// ANALYSIS

This is less a new model than a better packaging strategy for running a big MoE locally. The interesting part is the bit allocation: it tries to preserve quality where residual-stream noise hurts most while squeezing the rest hard enough to make the model more practical on high-RAM Macs.

–The release targets a narrow but real audience: people who want frontier-ish MiniMax M2.7 behavior without hauling around the full FP8 source.
–The mixed-bit scheme is the point, not just the file size; it explicitly trades 4-bit capacity for `down_proj` and 2-bit compression for `gate_proj`/`up_proj`.
–The pre-stacked layout removes runtime restacking, which should make cold loads simpler and more predictable in MLX workflows.
–The non-commercial license and 74 GB footprint mean this is still a power-user artifact, not a broadly deployable everyday model.
–Compared with the slimmer 2-bit JANGTQ variant, this is the quality-first option in the same local-inference family.

// TAGS

llmquantizationmoelocal-firstopen-weightsjangq-ai-minimax-m2.7-jangtq-k

DISCOVERED

1h ago

2026-05-08

PUBLISHED

2h ago

2026-05-07

RELEVANCE

8/ 10

AUTHOR

cafedude

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

VIDEO1h ago

GitHub spotlights Spec Kit today

GitHub’s Open Source Friday stream will break down Spec Kit, an open-source toolkit for spec-driven development. The pitch is that clearer specs reduce ambiguity and make collaboration with AI coding agents more predictable.

MODEL1h ago

OpenAI GPT-Realtime-2 Powers Live Translator Demo

This retweet points to a live demo showing GPT-Realtime-2 as the core model for a real-time translator workflow. The timing lines up with OpenAI’s new voice-model release that also introduces GPT-Realtime-Translate and GPT-Realtime-Whisper, so the post reads more like an early builder showcase than a standalone product announcement.

UPDATE2h ago

Genspark Realtime Voice gains GPT-Realtime-2

Genspark says its Call for Me Agent now runs on OpenAI's GPT-Realtime-2, upgrading the voice stack behind its hands-free task runner. The move should improve longer, tool-heavy conversations like calls, scheduling, messaging, and follow-through.