about / blog / sources / projects / ZH /

Jin Pan

ML Systems / LLM Inference / RL Infrastructure

Second-year MS/PhD student in Computer Sciences at UW-Madison, working on ML Systems. SGLang community contributor. Currently interning at AMD GenAI, focusing on RL systems and GPU kernel optimization.

More about me →

Recent Posts

Autotune, End to End — From Triton to FlyDSL

How kernel autotuning is designed in Triton, used in aiter / quack / CuteDSL, and consumed by inference engines like SGLang — and, from all of that, how to design a real autotune path for FlyDSL. With a full HTML deep dive carrying six hand-drawn SVG plates.
Long-Sequence MoE RL Training: From First Principles to MI300X

A first-principles read of Yan Bai's long-sequence MoE RL optimizations — Path B recompute, linear cross-entropy, FSDP2, chunked expert-parallel overlap — and what each one means on AMD MI300X / MI355X.
FlyDSL notes — BasisAttr, the layer beneath Layout

My FlyDSL source reading collapsed the layout algebra into five words. This is the patch — Fly_Basis, BasisAttr, what they are, why layouts need them, and where to start when your mentor hands you the 'complete the BasisAttr surface' task.
From Python to Silicon — A Compiler & Arch Primer for the Working ML Engineer

You can write production ML systems for years without knowing what IR, MLIR, LLVM, ISA, or FFI actually mean. This is the patch — a bilingual primer for the undergrad-CS-but-skipped-compilers crowd, with a full HTML deep dive carrying six hand-drawn SVG plates.
Attention Mechanisms — Full, Sparse, Linear, NSA & GLA

Breaking down Full, Sparse, and Linear Attention, all the way to DeepSeek NSA and Gated Linear Attention

Recent Projects

See all projects

Miles

Enterprise RL framework for LLM/VLM post-training. Integrates SGLang rollout + Megatron training with FP8 pipeline and MoE support.
SpecForge

Train speculative decoding draft models and port them to SGLang serving. Part of the SGLang ecosystem.
TritonForge

LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL.
APRIL

Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM training.
SGLang

High-performance serving framework for large language models and multimodal models. Contributor.

Contact

Find me on social media or send an email.

GitHub /
LinkedIn /
jpan236@wisc.edu

© 2026 • Jinn's Hub 🔬

Press Esc or click anywhere to close