slime is an LLM post-training framework for RL Scaling, developed by THUDM. It provides infrastructure for training large language models with reinforcement learning at scale, supporting algorithms like PPO and GRPO.
slime is an LLM post-training framework for RL Scaling, developed by THUDM. It provides infrastructure for training large language models with reinforcement learning at scale, supporting algorithms like PPO and GRPO.