Quit Emailing Yourself

2 links tagged with all of: reinforcement-learning + grpo

Links

RL Training For Math Reasoning

Reinforcement Learning (RL) techniques, particularly the Group Relative Policy Optimization (GRPO) algorithm, have been utilized to significantly improve the mathematical reasoning capabilities of language models. The study highlights how proper infrastructure, data diversity, and effective training practices can enhance performance, while also addressing challenges like model collapse and advantage estimation bias.

Saved by tldr-importer · Last saved October 29, 2025 · 8 min read

reinforcement-learning ✓ + math-reasoning grpo ✓ + algorithm-development + training-techniques

🐯 Liger GRPO meets TRL

Liger enhances TRL’s Group Relative Policy Optimization (GRPO) by reducing memory consumption by 40% during training without sacrificing model quality. The integration also introduces support for Fully Sharded Data Parallel (FSDP) and Parameter-Efficient Fine-Tuning (PEFT), facilitating scalable training across multiple GPUs. Additionally, Liger Loss can be paired with vLLM for accelerated text generation during training.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ liger grpo ✓ + memory-optimization reinforcement-learning ✓ + fine-tuning