3 links tagged with all of: reasoning + language-models + reinforcement-learning
Click any tag below to further narrow down your results
Links
This article explores a new sampling algorithm for large language models (LLMs) that enhances reasoning capabilities without additional training. The authors demonstrate that their method can achieve single-shot reasoning performance comparable to reinforcement learning techniques while maintaining better diversity in outputs.
Reinforcement Learned Teachers (RLT) train teacher models to generate clear explanations from question-answer pairs, enhancing student models' understanding. This innovative approach allows compact teacher models to outperform larger ones in reasoning tasks, significantly reducing training costs and times while maintaining effectiveness. The framework shifts the focus from problem-solving to teaching, promising advancements in AI reasoning models.
Reinforcement Learning on Pre-Training Data (RLPT) introduces a new paradigm for scaling large language models (LLMs) by allowing the policy to autonomously explore meaningful trajectories from pre-training data without relying on human annotations for rewards. By adopting a next-segment reasoning objective, RLPT improves LLM capabilities, as demonstrated by significant performance gains on various reasoning benchmarks and encouraging broader context exploration for enhanced generalization.