BACK_TO_FEEDAICRIER_2
Autoresearch Finds RTX 5090 Sweet Spot
OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoBENCHMARK RESULT

Autoresearch Finds RTX 5090 Sweet Spot

This Reddit post documents the hard-won path to making Karpathy’s Autoresearch behave on an RTX 5090/Blackwell setup. The stable recipe was to avoid the broken full-model compile path, keep the fused optimizer gains, use SDPA/CuDNN attention, and settle on a smaller total batch with a longer training budget.

// ANALYSIS

This reads less like a triumphant benchmark and more like a reminder that on bleeding-edge GPUs, “runs” and “runs well” are worlds apart.

  • The biggest early trap was a technically valid path that was catastrophically slow, which made MFU look better or worse depending on the denominator and obscured the real issue.
  • Higher per-device batch settings backfired, while `TOTAL_BATCH_SIZE = 2**17` emerged as the better operating region than larger or smaller alternatives.
  • The win came from stacking several practical fixes: fused optimizer compile where it helped, stable SDPA/CuDNN attention, and a longer `TIME_BUDGET = 1200` once the batch regime stabilized.
  • Automation mattered as much as model tuning; the benchmark/extract/strategize/rerun loop had its own failure modes around lock cleanup, completion hooks, and dispatch order.
  • The result is valuable because it turns a flaky setup into something reproducible enough to support real follow-on experiments, not just one-off hero runs.
// TAGS
autoresearchbenchmarkgpuautomationopen-sourcellmagent

DISCOVERED

23d ago

2026-03-20

PUBLISHED

23d ago

2026-03-20

RELEVANCE

8/ 10

AUTHOR

Delicious_Rule_438