All tags
Posts tagged with "kernel-optimization"
Source Reading 006 — FlyDSL, A Layout-Algebra Python DSL with an MLIR Spine
AMD's FlyDSL is the Python front-end for a Fly-dialect MLIR compiler that lowers layout algebra and copy/MMA atoms to ROCDL on CDNA3/CDNA4. Four examples — vectorAdd, tiledCopy, tiledMma, preshuffle GEMM — form a strict pedagogical ladder; reading them in order gives you every machinery that real production kernels (paged attention, MoE GEMM, flash attention) recombine.
FlyDSL notes — BasisAttr, the layer beneath Layout
My FlyDSL source reading collapsed the layout algebra into five words. This is the patch — Fly_Basis, BasisAttr, what they are, why layouts need them, and where to start when your mentor hands you the 'complete the BasisAttr surface' task.
Source Reading 005 — GCNasm, Sixty-Four Katas for the AMD ISA Manual You Never Finished
carlushuang's gcnasm repo is the rare middle ground between HIP tutorials and the 1,200-page CDNA3 ISA manual: 64 short, self-contained kernels that show what hand-tuned AMD code actually looks like. Six hours of careful reading buys you a working mental model of MFMA, vmcnt pipelining, DPP cross-lane primitives, and the trick the LLVM assembler refuses to let you write.