Reinforcement Pre-Training (RPT) is introduced as a novel approach for enhancing large language models through reinforcement learning by treating next-token prediction as a reasoning task. RPT utilizes vast text data to improve language modeling accuracy and provides a strong foundation for subsequent reinforcement fine-tuning, demonstrating consistent improvements in prediction accuracy with increased training compute.
reinforcement-learning ✓
language-models ✓
next-token-prediction ✓
pre-training ✓
+ scaling-paradigms