BACK_TO_FEEDAICRIER_2
Orion cracks Apple Neural Engine training
OPEN_SOURCE ↗
REDDIT · REDDIT// 37d agoOPENSOURCE RELEASE

Orion cracks Apple Neural Engine training

Orion is a new MIT-licensed open-source runtime and compiler stack for running and training small transformers directly on Apple’s Neural Engine without CoreML or Metal. The project demonstrates stable multi-step training of a 110M-parameter model on-device, plus 170+ tokens/sec inference for GPT-2 124M on M-series hardware.

// ANALYSIS

This is a genuinely interesting systems release because it turns Apple’s heavily abstracted NPU from an inference-only black box into something researchers can actually program against. Orion is less about raw training speed today than proving that low-level, on-device LLM training on ANE is possible at all.

  • Orion bypasses CoreML with direct ANE execution, a custom graph IR, and compiler passes that emit ANE-native MIL
  • The standout claim is stability: the project reports 1,000 training steps on a 110M transformer with loss dropping from 12.29 to 6.19 and zero NaN divergence
  • Delta compilation and LoRA hot-swap make the repo more than a one-off hack; it is shaping into a reusable local AI runtime for Apple Silicon
  • The bottleneck is still brutal because ANE weight updates are constrained by compile and reload behavior, so this is a platform breakthrough before it is a practical fine-tuning stack
  • For AI infra and edge researchers, Orion is a strong signal that Apple’s Neural Engine has far more untapped developer value than the official ML stack exposes
// TAGS
orionllmopen-sourceedge-aiinference

DISCOVERED

37d ago

2026-03-06

PUBLISHED

38d ago

2026-03-05

RELEVANCE

8/ 10

AUTHOR

No_Gap_4296