BACK_TO_FEEDAICRIER_2
AutoKernel tunes GPU kernels with agents
OPEN_SOURCE ↗
YT · YOUTUBE// 21d agoOPENSOURCE RELEASE

AutoKernel tunes GPU kernels with agents

AutoKernel is an open-source agent loop for GPU kernel optimization: it profiles a PyTorch model, isolates bottlenecks, rewrites Triton or CUDA kernels, and benchmarks each change before keeping or reverting it. The repo ships with starter kernels, self-contained model examples, and an Amdahl's-law scheduler for overnight optimization runs on NVIDIA GPUs.

// ANALYSIS

This is the most credible version of "agentic code optimization" because the feedback loop is brutally concrete: compile, benchmark, accept or revert. The real bet isn't that agents write perfect kernels on the first try; it's that enough cheap, disciplined iterations can outpace human tuning.

  • The `keep/revert` loop plus fixed benchmarks is exactly how you keep an optimization agent from hallucinating wins.
  • Supporting both Triton and CUDA C++ makes it useful for teams at different maturity levels, not just Triton-native shops.
  • The Amdahl's-law scheduler is smart product design: it stops wasting time on kernels that can no longer move end-to-end latency much.
  • The 5-stage correctness checks and roofline analysis are more convincing than raw TFLOPS screenshots.
  • AutoKernel fits a broader wave of autonomous kernel work, but its open-source repo makes it easier for developers to inspect and extend.
// TAGS
autokernelagentgpuautomationdevtoolopen-source

DISCOVERED

21d ago

2026-03-21

PUBLISHED

21d ago

2026-03-21

RELEVANCE

8/ 10

AUTHOR

Github Awesome