SpecForge | Jinn's Hub

SpecForge trains speculative decoding draft models and integrates them into SGLang for faster LLM inference. Draft models predict multiple tokens ahead, letting the main model verify in parallel — reducing latency without sacrificing quality.