OPEN_SOURCE ↗
REDDIT · REDDIT// 37d agoOPENSOURCE RELEASE
Orion cracks Apple Neural Engine training
Orion is a new MIT-licensed open-source runtime and compiler stack for running and training small transformers directly on Apple’s Neural Engine without CoreML or Metal. The project demonstrates stable multi-step training of a 110M-parameter model on-device, plus 170+ tokens/sec inference for GPT-2 124M on M-series hardware.
// ANALYSIS
This is a genuinely interesting systems release because it turns Apple’s heavily abstracted NPU from an inference-only black box into something researchers can actually program against. Orion is less about raw training speed today than proving that low-level, on-device LLM training on ANE is possible at all.
- –Orion bypasses CoreML with direct ANE execution, a custom graph IR, and compiler passes that emit ANE-native MIL
- –The standout claim is stability: the project reports 1,000 training steps on a 110M transformer with loss dropping from 12.29 to 6.19 and zero NaN divergence
- –Delta compilation and LoRA hot-swap make the repo more than a one-off hack; it is shaping into a reusable local AI runtime for Apple Silicon
- –The bottleneck is still brutal because ANE weight updates are constrained by compile and reload behavior, so this is a platform breakthrough before it is a practical fine-tuning stack
- –For AI infra and edge researchers, Orion is a strong signal that Apple’s Neural Engine has far more untapped developer value than the official ML stack exposes
// TAGS
orionllmopen-sourceedge-aiinference
DISCOVERED
37d ago
2026-03-06
PUBLISHED
38d ago
2026-03-05
RELEVANCE
8/ 10
AUTHOR
No_Gap_4296