Liger enhances TRL’s Group Relative Policy Optimization (GRPO) by reducing memory consumption by 40% during training without sacrificing model quality. The integration also introduces support for Fully Sharded Data Parallel (FSDP) and Parameter-Efficient Fine-Tuning (PEFT), facilitating scalable training across multiple GPUs. Additionally, Liger Loss can be paired with vLLM for accelerated text generation during training.
+ liger
+ grpo
memory-optimization ✓
reinforcement-learning ✓
fine-tuning ✓