7 links
tagged with all of: reasoning + language-models
Click any tag below to further narrow down your results
Links
A new scaling paradigm for language models, called Parallel Scaling (ParScale), is introduced, emphasizing parallel computation during training and inference. This approach demonstrates significant benefits, including improved reasoning performance, greater inference efficiency, and reduced memory and latency costs compared to traditional parameter scaling. The authors provide various models and tools to facilitate implementation and experimentation with this new scaling law.
Reinforcement Learned Teachers (RLT) train teacher models to generate clear explanations from question-answer pairs, enhancing student models' understanding. This innovative approach allows compact teacher models to outperform larger ones in reasoning tasks, significantly reducing training costs and times while maintaining effectiveness. The framework shifts the focus from problem-solving to teaching, promising advancements in AI reasoning models.
Recent advancements in large language models (LLMs) have prompted discussions about their reasoning capabilities. This study introduces a representation engineering approach that leverages model activations to create control vectors, enhancing reasoning performance on various tasks without additional training. The results indicate that modulating model activations can effectively improve LLMs' reasoning abilities.
ThinkMesh is a Python library designed for executing various reasoning strategies in parallel using language models, particularly leveraging the Qwen2.5-7B-Instruct model. It supports multiple reasoning approaches such as DeepConf, Self-Consistency, and Debate, catering to a range of problem types from mathematical proofs to planning tasks. The library also includes performance monitoring and benchmarking features to ensure effective usage and integration with different backends.
REverse-Engineered Reasoning (REER) introduces a novel approach to instilling deep reasoning in language models by working backwards from known solutions to discover the underlying reasoning process. This method addresses the limitations of traditional reinforcement learning and instruction distillation, resulting in the creation of a large dataset, DeepWriting-20K, and a model, DeepWriter-8B, that outperforms existing models in open-ended tasks. The research emphasizes the importance of structured reasoning and iterative refinement in generating high-quality outputs.
Reinforcement Learning on Pre-Training Data (RLPT) introduces a new paradigm for scaling large language models (LLMs) by allowing the policy to autonomously explore meaningful trajectories from pre-training data without relying on human annotations for rewards. By adopting a next-segment reasoning objective, RLPT improves LLM capabilities, as demonstrated by significant performance gains on various reasoning benchmarks and encouraging broader context exploration for enhanced generalization.
The paper introduces the Chain of Draft (CoD) paradigm, which enables Large Language Models (LLMs) to generate concise intermediate reasoning outputs, mimicking human draft strategies. By focusing on essential information and reducing verbosity, CoD achieves comparable or superior accuracy to Chain-of-Thought prompting while utilizing significantly fewer tokens, thus lowering costs and latency in reasoning tasks.