Posts tagged with "source-reading"

Source Reading 006 — FlyDSL, A Layout-Algebra Python DSL with an MLIR Spine

AMD's FlyDSL is the Python front-end for a Fly-dialect MLIR compiler that lowers layout algebra and copy/MMA atoms to ROCDL on CDNA3/CDNA4. Four examples — vectorAdd, tiledCopy, tiledMma, preshuffle GEMM — form a strict pedagogical ladder; reading them in order gives you every machinery that real production kernels (paged attention, MoE GEMM, flash attention) recombine.

Source Reading 005 — GCNasm, Sixty-Four Katas for the AMD ISA Manual You Never Finished

carlushuang's gcnasm repo is the rare middle ground between HIP tutorials and the 1,200-page CDNA3 ISA manual: 64 short, self-contained kernels that show what hand-tuned AMD code actually looks like. Six hours of careful reading buys you a working mental model of MFMA, vmcnt pipelining, DPP cross-lane primitives, and the trick the LLVM assembler refuses to let you write.

Source Reading 004 — mini-SGLang, and How a 140× Smaller Twin Teaches the Full System

A 5,000-line teaching implementation maintained alongside the 728,000-line production engine. Read in five hours. The reflections on what to take away from minimal implementations are at least as important as the code itself.

Source Reading 003 — vLLM, Where KV Cache Became Virtual Memory

633k lines, 20 attention backends, a 7,185-line GPU model runner, and an architecture rewrite that turned the engine itself into a process. The closing of the inference-engine trilogy.

Source Reading 002 — SGLang, an Inference Engine That's Actually a Four-Process Distributed System

729,000 lines, 27 attention backends, a 4,006-line scheduler, and a radix tree that turns chat prefixes into KV cache hits. A deep reading of the inference engine I use daily at AMD.

Source Reading 001 — SkyPilot, 211,510 Lines of Multi-Cloud Orchestration

A six-and-a-half-hour reading of SkyPilot's source — the three-zone architecture, the DP+ILP optimizer, and what it takes to add a new cloud backend.