Quit Emailing Yourself

# reasoning → deep-learning

6 links tagged with all of: reasoning + deep-learning

Click any tag below to further narrow down your results

Links

Deep Think with Confidence

Deep Think with Confidence (DeepConf) is introduced as a method to improve reasoning efficiency and performance in large language models by using internal confidence signals to filter out low-quality reasoning traces. It requires no additional training or tuning and can be easily integrated into existing systems. Evaluations show significant accuracy improvements and a reduction in generated tokens on various reasoning tasks.

Saved by markshervey · Last saved January 12, 2026 · 1 min read

+ machine-learning + large-language-models + efficiency reasoning ✓ deep-learning ✓

[no-title]

Google has launched Gemini, a new deep thinking AI model designed to enhance reasoning capabilities by testing multiple ideas in parallel. This advancement aims to improve decision-making processes and could significantly impact various applications in AI technology.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ google + gemini + ai deep-learning ✓ reasoning ✓

Deep Think with Confidence

Deep Think with Confidence (DeepConf) is a novel parallel thinking method that improves reasoning performance and efficiency of large language models (LLMs) by utilizing internal confidence signals to filter out low-quality reasoning traces. It can be integrated into existing frameworks without the need for additional training or tuning, achieving up to 99.9% accuracy on the AIME 2025 dataset while significantly reducing token generation. A real-time demo is available using the Qwen3-8B model with parallel thinking on the HMMT'25 dataset.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

deep-learning ✓ + llm reasoning ✓ + efficiency + parallel-thinking

REverse-Engineered Reasoning for Open-Ended Generation

REverse-Engineered Reasoning (REER) introduces a novel approach to instilling deep reasoning in language models by working backwards from known solutions to discover the underlying reasoning process. This method addresses the limitations of traditional reinforcement learning and instruction distillation, resulting in the creation of a large dataset, DeepWriting-20K, and a model, DeepWriter-8B, that outperforms existing models in open-ended tasks. The research emphasizes the importance of structured reasoning and iterative refinement in generating high-quality outputs.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

deep-learning ✓ reasoning ✓ + language-models + dataset + open-ended-generation

Why We Think

The article explores the concept of test-time compute in deep learning, particularly how models can improve their performance by engaging in a more extended reasoning process akin to human thinking. It discusses various strategies for enhancing model output through methods like chain-of-thought reasoning, parallel sampling, and sequential revision, emphasizing the balance between computational resources and accuracy in problem-solving.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ test-time-compute + chain-of-thought deep-learning ✓ reasoning ✓ + model-performance

GitHub - EsmaeilNarimissa/aws-sft-grpo-budget-llm-finetune

Fine-tuning an instruction-tuned LLM (Qwen2.5B) for reasoning tasks is achieved using a cost-effective pipeline inspired by DeepSeek R1, implementing Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) on AWS SageMaker. The article details the training stages, reward function design, and experimental outcomes, providing guidance for replicating the results and utilizing the associated codebase.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ fine-tuning + llm reasoning ✓ + aws-sagemaker deep-learning ✓