All tags
Posts tagged with "sglang"
Source Reading 004 — mini-SGLang, and How a 140× Smaller Twin Teaches the Full System
A 5,000-line teaching implementation maintained alongside the 728,000-line production engine. Read in five hours. The reflections on what to take away from minimal implementations are at least as important as the code itself.
Source Reading 002 — SGLang, an Inference Engine That's Actually a Four-Process Distributed System
729,000 lines, 27 attention backends, a 4,006-line scheduler, and a radix tree that turns chat prefixes into KV cache hits. A deep reading of the inference engine I use daily at AMD.