Posts tagged with "vllm"

Source Reading 003 — vLLM, Where KV Cache Became Virtual Memory

633k lines, 20 attention backends, a 7,185-line GPU model runner, and an architecture rewrite that turned the engine itself into a process. The closing of the inference-engine trilogy.