Quit Emailing Yourself

# reinforcement-learning → scaling-paradigms → next-token-prediction

1 link tagged with all of: reinforcement-learning + scaling-paradigms + next-token-prediction

Click any tag below to further narrow down your results

Links

Reinforcement Pre-Training

Reinforcement Pre-Training (RPT) is introduced as a novel approach for enhancing large language models through reinforcement learning by treating next-token prediction as a reasoning task. RPT utilizes vast text data to improve language modeling accuracy and provides a strong foundation for subsequent reinforcement fine-tuning, demonstrating consistent improvements in prediction accuracy with increased training compute.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

reinforcement-learning ✓ + language-models next-token-prediction ✓ + pre-training scaling-paradigms ✓