BACK_TO_FEEDAICRIER_2
Spiral drops INT3 compression, Metal kernels for Mac
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoOPENSOURCE RELEASE

Spiral drops INT3 compression, Metal kernels for Mac

Spiral is a high-performance model compression framework by ReinforceAI designed for local LLM inference on Apple Silicon. It combines novel INT3 quantization with custom-fused Metal kernels and 2-bit KV cache optimization to enable running large models like Qwen 7B with minimal accuracy loss on consumer-grade Mac hardware.

// ANALYSIS

Spiral pushes local inference on Apple Silicon into highly optimized INT3 territory, significantly outperforming standard 4-bit quantization in memory efficiency for long-horizon tasks.

  • Geometric compression approach achieves SOTA INT3 quality with only +0.14 nats perplexity cost, a claimed 101x improvement over naive 3-bit methods.
  • Custom fused Metal kernels integrate PQ decode, RoPE, and attention into a single pass, optimizing prefill and decode speeds on Mac GPUs.
  • The 2-bit KV cache implementation scales context capacity by 1.75x, allowing an 8GB Mac to handle context windows up to 113K tokens.
  • Calibration-free workflow eliminates the need for fine-tuning or gradient updates, simplifying the deployment of compressed models for researchers and developers.
  • Currently in preview with Qwen 7B support; future updates aim to support models up to 100B parameters and add Triton kernels for GPU support.
// TAGS
spiralllmquantizationedge-aigpuopen-sourcemacosapple-silicon

DISCOVERED

4h ago

2026-04-22

PUBLISHED

6h ago

2026-04-22

RELEVANCE

8/ 10

AUTHOR

Financial_Buy_2287