Reinforcement Learning on Pre-Training Data (RLPT) introduces a new paradigm for scaling large language models (LLMs) by allowing the policy to autonomously explore meaningful trajectories from pre-training data without relying on human annotations for rewards. By adopting a next-segment reasoning objective, RLPT improves LLM capabilities, as demonstrated by significant performance gains on various reasoning benchmarks and encouraging broader context exploration for enhanced generalization.