JANGQ brings usable 2-bit MLX quantization to Apple Silicon

// 119d agoOPENSOURCE RELEASE

JANGQ brings usable 2-bit MLX quantization to Apple Silicon

JANGQ (Jang Adaptive N-bit Grading) is a new open-source mixed-precision quantization framework for Apple Silicon that makes ultra-low-bit MLX inference viable by protecting sensitive attention layers at higher precision while aggressively compressing bulk MLP parameters. Where native MLX uniform 2-bit quantization produces near-unusable output, JANGQ achieves 7/10 correctness at comparable bit widths — enabling 122B+ models to run usably on Macs with 128GB unified memory.

// ANALYSIS

JANGQ fills a gap that has quietly frustrated the Apple Silicon local-inference crowd: MLX's uniform quantization at 2-bit is so lossy it's been effectively unusable, leaving Mac users behind GGUF on llama.cpp in the ultra-low-bit regime. This is a direct fix.

–The key insight is layer-sensitivity tiering: attention and output heads get 6-8 bits, MLP/expert layers get 2-3 bits — since MoE expert parameters can be 98% of total weights, protecting just the 2% attention budget costs almost nothing in memory
–Benchmarks on M4 Max (128GB) show Qwen3.5-122B at 46 GB / 45 tok/s with JANG_1L, versus effectively broken output from MLX uniform 2-bit
–Claims 25% memory savings vs. uniform 4-bit at comparable quality — 3.37-bit JANGQ outperforming uniform 4-bit on logit MSE
–MLX Studio and vMLX (the companion inference front-end and engine) ship natively with JANGQ support; vMLX claims 224x faster long-context inference than LM Studio via a five-layer KV cache stack
–Pre-quantized models already available on HuggingFace for Qwen3.5 family; conversion tooling installable via pip with one-line `jang convert` command

// TAGS

jangqllminferenceopen-sourceedge-aimlops

DISCOVERED

119d ago

2026-03-16

PUBLISHED

119d ago

2026-03-16

RELEVANCE

7/ 10

AUTHOR

HealthyCommunicat

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS18m ago

swyx outlines specialized multi-model AI workflow

In a recent tweet, swyx shared his multi-model AI stack for complex projects, assigning specialized tasks to models like sol ultra for planning, fable 5 for critiquing, and sonnet 5 for code generation. He also highlighted the importance of interactive, interview-style prompting to clarify design decisions.

NEWS21m ago

Tweet mocks Claude Fable 5 safety filters

Indie developer Pieter Levels (@levelsio) shared a post mocking the overly sensitive safety guardrails of Anthropic's Claude Fable 5 AI model. The message satirizes Fable's warning system by claiming a 'life simulation' was downgraded to Opus 4.5 without appeal, highlighting developer frustration with aggressive safety routing.

LAUNCH47m ago

Brockman highlights ChatGPT Work mobile experience

OpenAI President and Co-founder Greg Brockman shared his enthusiasm for ChatGPT Work, noting that while the new agent-based platform has received less attention than other recent updates, it offers a highly functional and impressive mobile experience. Powered by the GPT-5.6 model family, ChatGPT Work transitions ChatGPT from a conversational chatbot into an autonomous agent capable of executing complex, multi-step workflows and cross-app integrations directly from mobile and desktop interfaces.