Quit Emailing Yourself

3 links tagged with all of: language-models + optimization

Click any tag below to further narrow down your results

Links

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

DuPO introduces a dual learning-based preference optimization framework designed to generate annotation-free feedback, overcoming limitations of existing methods such as RLVR and traditional dual learning. By decomposing a task's input into known and unknown components and reconstructing the unknown part, DuPO enhances various tasks, achieving significant improvements in translation quality and mathematical reasoning accuracy. This framework positions itself as a scalable and general approach for optimizing large language models (LLMs) without the need for costly labels.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ machine-learning optimization ✓ + self-supervision + dual-learning language-models ✓

TreeRL: LLM Reinforcement Learning with On-Policy Tree Search

TreeRL is a novel reinforcement learning framework that integrates on-policy tree search to enhance the training of language models. By incorporating intermediate supervision and optimizing search efficiency, TreeRL addresses issues common in traditional reinforcement learning methods, such as distribution mismatch and reward hacking. Experimental results show that TreeRL outperforms existing methods in math and code reasoning tasks, showcasing the effectiveness of tree search in this domain.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ reinforcement-learning + tree-search language-models ✓ + machine-learning optimization ✓

https://epoch.ai/blog/inference-economics-of-language-models

The article explores the economic implications of using language models for inference, highlighting the costs associated with deploying these models in real-world applications. It discusses factors that influence pricing, efficiency, and the overall impact on businesses leveraging language models in various sectors. The analysis aims to provide insights into optimizing the use of language models while balancing performance and cost-effectiveness.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ inference language-models ✓ + economics optimization ✓ + deployment