RightNow-AI/AutoMegaKernel
An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.
GitHub repository with 38 stars and 4 forks.
Language: Python
Topics: agent-harness, cuda, gpu, gpu-programming, kernel-fusion, llm-inference, machine-learning, megakernel, mlsys