Quit Emailing Yourself

4 links tagged with all of: reasoning + large-language-models

Click any tag below to further narrow down your results

Links

Deep Think with Confidence

Deep Think with Confidence (DeepConf) is introduced as a method to improve reasoning efficiency and performance in large language models by using internal confidence signals to filter out low-quality reasoning traces. It requires no additional training or tuning and can be easily integrated into existing systems. Evaluations show significant accuracy improvements and a reduction in generated tokens on various reasoning tasks.

Saved by markshervey · Last saved January 12, 2026 · 1 min read

+ machine-learning large-language-models ✓ + efficiency reasoning ✓ + deep-learning

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Continued scaling of large language models (LLMs) may not yield diminishing returns as previously thought; even small improvements in accuracy can lead to significant advancements in long-horizon task execution. The study reveals that LLMs struggle with longer tasks not due to reasoning limitations, but execution errors that compound over time, highlighting the importance of model size and strategic thinking in improving performance.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

large-language-models ✓ + execution-capability reasoning ✓ + self-conditioning + task-length

TextQuests: How Good are LLMs at Text-Based Video Games?

TextQuests introduces a benchmark to evaluate the performance of Large Language Models (LLMs) in classic text-based video games, focusing on their ability to engage in long-context reasoning and learning through exploration. The evaluation involves assessing agents' progress and ethical behavior across various interactive fiction games, revealing challenges such as hallucination and inefficiency in dynamic thinking. The aim is to help researchers better understand LLM capabilities in complex, exploratory environments.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

large-language-models ✓ + text-based-games + evaluation reasoning ✓ + exploration

JudgeLRM: Large Reasoning Models as a Judge

JudgeLRM introduces a novel approach to using Large Language Models (LLMs) as evaluators, particularly in complex reasoning tasks. By employing reinforcement learning with judge-wise rewards, JudgeLRM models significantly outperform traditional Supervised Fine-Tuning methods and current leading models, demonstrating superior performance in tasks that require deep reasoning.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

large-language-models ✓ reasoning ✓ + reinforcement-learning + evaluation + machine-learning