Quit Emailing Yourself

# language-models → reasoning → pre-training

1 link tagged with all of: language-models + reasoning + pre-training

Click any tag below to further narrow down your results

Links

Reinforcement Learning on Pre-Training Data

Reinforcement Learning on Pre-Training Data (RLPT) introduces a new paradigm for scaling large language models (LLMs) by allowing the policy to autonomously explore meaningful trajectories from pre-training data without relying on human annotations for rewards. By adopting a next-segment reasoning objective, RLPT improves LLM capabilities, as demonstrated by significant performance gains on various reasoning benchmarks and encouraging broader context exploration for enhanced generalization.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ reinforcement-learning pre-training ✓ language-models ✓ + scaling reasoning ✓