Jinn's Hub
about / blog / projects / ZH /
All tags

Posts tagged with "vllm"

    Source Reading 003 — vLLM, Where KV Cache Became Virtual Memory
    633k lines, 20 attention backends, a 7,185-line GPU model runner, and an architecture rewrite that turned the engine itself into a process. The closing of the inference-engine trilogy.
© 2026 • Jinn's Hub 🔬
Press Esc or click anywhere to close