Posts tagged with "SWE-Bench"

Paper Reading 001 — Polar: Training Agents Without Opening the Box

Polar (arXiv 2605.24220, NVIDIA) trains language agents with RL by proxying their LLM API calls instead of rewriting the harness. The integration point moves from the agent to the model endpoint — the exact seam we already run with SGLang. A close read of the architecture, the four-step proxy, token-faithful prefix merging, and the SWE-Bench results, with hand-coded SVG diagrams.