Jinn's Hub
about / blog / projects / ZH /

Jin Pan

ML Systems / LLM Inference / RL Infrastructure

Second-year MS/PhD student in Computer Sciences at UW-Madison, working on ML Systems. SGLang community contributor. Currently interning at AMD GenAI, focusing on RL systems and GPU kernel optimization.

More about me →

Recent Posts

See all posts
  • Paper Reading 002 — Kernel Design Agents: An Agent Loop That Builds Fast GPU Kernels
    HAN Lab's 'Kernel Mafia' pointed coding agents at the MLSys-2026 Blackwell kernel contest and placed — by letting the agent run the optimization loop itself. A close read of Kernel Design Agents (KDA): the Humanize plan-execute-verify loop, KernelWiki, ncu-report-skill, shape-aware autotuning, the contest results, and the reward-hacking failure modes — mapped onto what it would take to rebuild on AMD. Hand-coded SVG diagrams, bilingual.
  • Paper Reading 001 — Polar: Training Agents Without Opening the Box
    Polar (arXiv 2605.24220, NVIDIA) trains language agents with RL by proxying their LLM API calls instead of rewriting the harness. The integration point moves from the agent to the model endpoint — the exact seam we already run with SGLang. A close read of the architecture, the four-step proxy, token-faithful prefix merging, and the SWE-Bench results, with hand-coded SVG diagrams.
  • Source Reading 006 — FlyDSL, A Layout-Algebra Python DSL with an MLIR Spine
    AMD's FlyDSL is the Python front-end for a Fly-dialect MLIR compiler that lowers layout algebra and copy/MMA atoms to ROCDL on CDNA3/CDNA4. Four examples — vectorAdd, tiledCopy, tiledMma, preshuffle GEMM — form a strict pedagogical ladder; reading them in order gives you every machinery that real production kernels (paged attention, MoE GEMM, flash attention) recombine.
  • FlyDSL notes — BasisAttr, the layer beneath Layout
    My FlyDSL source reading collapsed the layout algebra into five words. This is the patch — Fly_Basis, BasisAttr, what they are, why layouts need them, and where to start when your mentor hands you the 'complete the BasisAttr surface' task.
  • From Python to Silicon — A Compiler & Arch Primer for the Working ML Engineer
    You can write production ML systems for years without knowing what IR, MLIR, LLVM, ISA, or FFI actually mean. This is the patch — a bilingual primer for the undergrad-CS-but-skipped-compilers crowd, with a full HTML deep dive carrying six hand-drawn SVG plates.

Recent Projects

See all projects
  • Miles
    Enterprise RL framework for LLM/VLM post-training. Integrates SGLang rollout + Megatron training with FP8 pipeline and MoE support.
  • SpecForge
    Train speculative decoding draft models and port them to SGLang serving. Part of the SGLang ecosystem.
  • TritonForge
    LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL.
  • APRIL
    Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM training.
  • SGLang
    High-performance serving framework for large language models and multimodal models. Contributor.

Contact

Find me on social media or send an email.

  • GitHub /
  • LinkedIn /
  • jpan236@wisc.edu
© 2026 • Jinn's Hub 🔬
Press Esc or click anywhere to close