Quit Emailing Yourself

# reasoning → llm

5 links tagged with all of: reasoning + llm

Click any tag below to further narrow down your results

Links

Deep Think with Confidence

Deep Think with Confidence (DeepConf) is a novel parallel thinking method that improves reasoning performance and efficiency of large language models (LLMs) by utilizing internal confidence signals to filter out low-quality reasoning traces. It can be integrated into existing frameworks without the need for additional training or tuning, achieving up to 99.9% accuracy on the AIME 2025 dataset while significantly reducing token generation. A real-time demo is available using the Qwen3-8B model with parallel thinking on the HMMT'25 dataset.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ deep-learning llm ✓ reasoning ✓ + efficiency + parallel-thinking

R-Zero: Self-Evolving Reasoning LLM from Zero Data

R-Zero is a self-evolving framework for Large Language Models (LLMs) that generates its own training data autonomously, circumventing reliance on human-curated tasks. It features two models—the Challenger, which poses increasingly difficult tasks, and the Solver, which solves them—allowing for co-evolution and significant improvements in reasoning capabilities across various benchmarks. Empirical results show notable enhancements in performance, particularly with the Qwen3-4B-Base model.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ machine-learning + self-evolving reasoning ✓ + autonomous-learning llm ✓

GitHub - UCSC-VLAA/MedReason: MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs

MedReason is a comprehensive medical reasoning dataset that enhances large language models (LLMs) by utilizing a structured medical knowledge graph to create detailed reasoning paths from clinical question-answer pairs. The dataset includes 32,682 QA pairs with step-by-step explanations, and the MedReason-8B model, fine-tuned on this data, achieves state-of-the-art performance in medical reasoning tasks. The project is open-sourced, providing access to models, data, and deployment codes for further research and applications.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

+ medical reasoning ✓ + dataset llm ✓ + knowledge-graph

GitHub - EsmaeilNarimissa/aws-sft-grpo-budget-llm-finetune

Fine-tuning an instruction-tuned LLM (Qwen2.5B) for reasoning tasks is achieved using a cost-effective pipeline inspired by DeepSeek R1, implementing Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) on AWS SageMaker. The article details the training stages, reward function design, and experimental outcomes, providing guidance for replicating the results and utilizing the associated codebase.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ fine-tuning llm ✓ reasoning ✓ + aws-sagemaker + deep-learning

PageIndex: Next Generation Vectorless, Reasoning-based RAG | PageIndex

The article introduces PageIndex, a reasoning-based retrieval framework designed to enhance long document processing by overcoming the limitations of traditional vector-based Retrieval-Augmented Generation (RAG) methods. Unlike conventional approaches that rely on static semantic similarity, PageIndex utilizes a dynamic, iterative reasoning process to navigate document structures and extract relevant information more effectively. This innovative model aims to improve the accuracy and relevance of responses generated by large language models in complex contexts.

Saved by hn_user_4 · Last saved October 28, 2025 · 3 min read

+ document retrieval reasoning ✓ llm ✓