Autoresearch-ANE slashes val_loss on Mac

// 124d agoBENCHMARK RESULT

Autoresearch-ANE slashes val_loss on Mac

The autoresearch-ane fork of Karpathy's autoresearch reports its best Apple Neural Engine run yet, cutting validation loss from 6.109 to 3.55 on an M3 MacBook. The big unlock was a dynamic weight pipeline that avoids constant recompilation and reportedly delivers about 11x more training steps in the same 5-minute budget.

// ANALYSIS

This is a meaningful systems result, not just a prettier loss curve: the project gets much more real training done on consumer Apple hardware once compilation stops dominating the wall clock.

–The repo says the dynamic pipeline compiles 10 ANE kernels once at startup, then stages weights dynamically, boosting throughput from roughly 120 to about 1340 steps per 5-minute run
–The ANE backend is its own training stack in Objective-C with TinyStories and `val_loss`, so the numbers are impressive but not directly comparable to Karpathy's CUDA `val_bpb` baseline
–Keeping the agent's edit surface mostly to `ane/experiment_config.h` makes autonomous overnight experimentation much more plausible on a laptop
–If these gains hold up, Apple's Neural Engine starts looking less like an inference novelty and more like a viable playground for small-model research loops

// TAGS

autoresearch-aneagentllmopen-sourceedge-aibenchmark

DISCOVERED

124d ago

2026-03-11

PUBLISHED

124d ago

2026-03-11

RELEVANCE

8/ 10

AUTHOR

paraboloed

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE1h ago

Perplexity Computer integrates Grok 4.5

Perplexity has integrated xAI's Grok 4.5 as the orchestrator for Perplexity Computer, achieving a top score of 0.328 on its internal WANDR benchmark. The integration is highly cost-effective, running at approximately half the cost of Anthropic's Claude Opus 4.8.

UPDATE1h ago

Inference optimizations boost GPT-5.6 Sol usage limits

Recent updates for Codex and ChatGPT Work have introduced inference optimizations, the savings of which are being passed directly to users. This results in approximately 10% more usage for all GPT-5.6 Sol subscriptions, with an emphasis on providing improvements without any feature restrictions.

UPDATE2h ago

Claude Code ignores admin SCIM plugin policies

An enterprise user highlighted a critical gap where marketplace plugin selection policies configured in the Claude Admin panel and mapped to SCIM groups do not sync or apply to Claude Code. This limitation breaks the centralized context administration model for organizations attempting broad, secure deployments of Claude across developer environments, as the CLI continues to rely on localized configuration controls instead of real-time organization policies.