Jinn's Hub
about
/
blog
/
projects
/
ZH
/
Search
All tags
Posts tagged with "vllm"
Source Reading 003 — vLLM, Where KV Cache Became Virtual Memory
633k lines, 20 attention backends, a 7,185-line GPU model runner, and an architecture rewrite that turned the engine itself into a process. The closing of the inference-engine trilogy.