5 links
tagged with all of: reasoning + llm
Click any tag below to further narrow down your results
Links
Deep Think with Confidence (DeepConf) is a novel parallel thinking method that improves reasoning performance and efficiency of large language models (LLMs) by utilizing internal confidence signals to filter out low-quality reasoning traces. It can be integrated into existing frameworks without the need for additional training or tuning, achieving up to 99.9% accuracy on the AIME 2025 dataset while significantly reducing token generation. A real-time demo is available using the Qwen3-8B model with parallel thinking on the HMMT'25 dataset.
R-Zero is a self-evolving framework for Large Language Models (LLMs) that generates its own training data autonomously, circumventing reliance on human-curated tasks. It features two models—the Challenger, which poses increasingly difficult tasks, and the Solver, which solves them—allowing for co-evolution and significant improvements in reasoning capabilities across various benchmarks. Empirical results show notable enhancements in performance, particularly with the Qwen3-4B-Base model.
MedReason is a comprehensive medical reasoning dataset that enhances large language models (LLMs) by utilizing a structured medical knowledge graph to create detailed reasoning paths from clinical question-answer pairs. The dataset includes 32,682 QA pairs with step-by-step explanations, and the MedReason-8B model, fine-tuned on this data, achieves state-of-the-art performance in medical reasoning tasks. The project is open-sourced, providing access to models, data, and deployment codes for further research and applications.
Fine-tuning an instruction-tuned LLM (Qwen2.5B) for reasoning tasks is achieved using a cost-effective pipeline inspired by DeepSeek R1, implementing Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) on AWS SageMaker. The article details the training stages, reward function design, and experimental outcomes, providing guidance for replicating the results and utilizing the associated codebase.
The article introduces PageIndex, a reasoning-based retrieval framework designed to enhance long document processing by overcoming the limitations of traditional vector-based Retrieval-Augmented Generation (RAG) methods. Unlike conventional approaches that rely on static semantic similarity, PageIndex utilizes a dynamic, iterative reasoning process to navigate document structures and extract relevant information more effectively. This innovative model aims to improve the accuracy and relevance of responses generated by large language models in complex contexts.