1 min read

TritonForge

Table of Contents

TritonForge is an LLM-powered GPU kernel synthesis framework. It trains models to convert PyTorch operations into optimized Triton kernels through SFT and RL.

Key features:

  • Multi-turn compilation feedback loop for iterative kernel refinement
  • Cross-platform support for both NVIDIA and AMD GPUs
  • Kernelbook and KernelBench integration for systematic evaluation
  • SFT + RL training pipeline for teaching models to write high-performance kernels