1 min read

SGLang

Table of Contents

SGLang is a high-performance serving framework for large language models and multimodal models. I contribute to the SGLang community, focusing on LLM inference optimization and serving infrastructure.

SGLang provides fast model execution with RadixAttention for prefix caching, continuous batching, and efficient KV cache management. It supports a wide range of models including LLaMA, Qwen, DeepSeek, and more.