Quit Emailing Yourself

# llm → reasoning → deep-learning

2 links tagged with all of: llm + reasoning + deep-learning

Links

Deep Think with Confidence

Deep Think with Confidence (DeepConf) is a novel parallel thinking method that improves reasoning performance and efficiency of large language models (LLMs) by utilizing internal confidence signals to filter out low-quality reasoning traces. It can be integrated into existing frameworks without the need for additional training or tuning, achieving up to 99.9% accuracy on the AIME 2025 dataset while significantly reducing token generation. A real-time demo is available using the Qwen3-8B model with parallel thinking on the HMMT'25 dataset.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

deep-learning ✓ llm ✓ reasoning ✓ + efficiency + parallel-thinking

GitHub - EsmaeilNarimissa/aws-sft-grpo-budget-llm-finetune

Fine-tuning an instruction-tuned LLM (Qwen2.5B) for reasoning tasks is achieved using a cost-effective pipeline inspired by DeepSeek R1, implementing Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) on AWS SageMaker. The article details the training stages, reward function design, and experimental outcomes, providing guidance for replicating the results and utilizing the associated codebase.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ fine-tuning llm ✓ reasoning ✓ + aws-sagemaker deep-learning ✓