Show HN: Auto GPU Kernel – Autonomous GPU-kernel discovery and optimizer
Auto GPU Kernel is an autonomous GPU-kernel discovery and optimizer that has achieved significant recognition. It ranked #1 in the MLSys 2026 FlashInfer AI Kernel Generation Contest, demonstrating an impressive average speedup of 34.93x. The tool is designed to work in isolated environments and can operate without a local GPU using cloud services.
- ▪Auto GPU Kernel ranked #1 in the MLSys 2026 FlashInfer AI Kernel Generation Contest.
- ▪It achieved an average speedup of 34.93x in the DeepSeek Sparse Attention track.
- ▪The kernel agent is compatible with FlashInfer format and can run on cloud services without a local GPU.
Opening excerpt (first ~120 words) tap to expand
Auto GPU Kernel 🏆 Autonomous GPU-kernel discovery & optimizer. Technical Report Ranked #1 on MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x. Submissions can be found at: Kernel Runtime (ms) dsa_sparse_attention_h16_ckv512_kpe64_topk2048_ps64 — DSA Sparse Attention 0.010 dsa_topk_indexer_fp8_h64_d128_topk2048_ps64 — DSA TopK Indexer 0.016 Setup Copy the template directory into a separate folder / git repository to make sure your agents work in an isolated environment. The kernel agent is compatible with FlashInfer format and can run without a local GPU on cloud using Modal. Requires Claude Code CLI.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.