Quit Emailing Yourself

# machine-learning → reinforcement-learning

10 links tagged with all of: machine-learning + reinforcement-learning

Click any tag below to further narrow down your results

+ fine-tuning (2) + ai (2) + language-models (2) + evaluation (2) + large-language-models (2) + spurious-rewards (1) + space-group (1) + self-adaptation (1) + optimization (1) + tree-search (1) + agent-deployment (1) + tooling (1) + reasoning (1) + reward-design (1) + thyme (1)

Links

GitHub - xhyumiracle/Awesome-AgenticLLM-RL-Papers

The repository serves as a comprehensive resource for the survey paper "The Landscape of Agentic Reinforcement Learning for LLMs: A Survey," detailing various reinforcement learning methods and their applications to large language models (LLMs). It includes tables summarizing methodologies, objectives, and key mechanisms, alongside links to relevant papers and resources in the field of AI.

Saved by tldr-importer · Last saved October 29, 2025 · 7 min read

reinforcement-learning ✓ + large-language-models + agentic-llm + research-survey machine-learning ✓

GitHub - deepmodeling/CrystalFormer: Space Group Informed Transformer for Crystalline Materials Generation

CrystalFormer is a transformer-based autoregressive model tailored for generating crystalline materials while adhering to space group symmetry, enhancing data and computational efficiency. It allows for conditional generation through a structured framework, which includes reinforcement learning and Markov chain Monte Carlo methods. The model supports various functionalities such as generating specific crystal structures and evaluating their validity and novelty.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ crystal-structure machine-learning ✓ + generative-modeling reinforcement-learning ✓ + space-group

Self-Adapting Language Models

Large language models (LLMs) typically cannot adapt their weights dynamically to new tasks or knowledge. The Self-Adapting LLMs (SEAL) framework addresses this limitation by allowing models to generate their own finetuning data and directives for self-adaptation through a reinforcement learning approach, resulting in persistent weight updates and improved performance in knowledge incorporation and few-shot generalization tasks.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ self-adaptation machine-learning ✓ + language-models reinforcement-learning ✓ + fine-tuning

TreeRL: LLM Reinforcement Learning with On-Policy Tree Search

TreeRL is a novel reinforcement learning framework that integrates on-policy tree search to enhance the training of language models. By incorporating intermediate supervision and optimizing search efficiency, TreeRL addresses issues common in traditional reinforcement learning methods, such as distribution mismatch and reward hacking. Experimental results show that TreeRL outperforms existing methods in math and code reasoning tasks, showcasing the effectiveness of tree search in this domain.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

reinforcement-learning ✓ + tree-search + language-models machine-learning ✓ + optimization

Fulcrum Research

Fulcrum Research is developing tools to enhance human oversight in a future where AI agents perform tasks such as software development and research. Their goal is to create infrastructure for safely deploying these agents, focusing on improving machine learning evaluations and environments. They invite collaboration from those working on reinforcement learning and agent deployment.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ ai machine-learning ✓ + tooling reinforcement-learning ✓ + agent-deployment

[no-title]

The article discusses the process of reinforcement learning fine-tuning, detailing how to enhance model performance through specific training techniques. It emphasizes the importance of tailored approaches to improve the adaptability and efficiency of models in various applications. The information is aimed at practitioners looking to leverage reinforcement learning for real-world tasks.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

reinforcement-learning ✓ + fine-tuning + model-training machine-learning ✓ + ai

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

WavReward is a novel reward feedback model designed to evaluate spoken dialogue systems by assessing both their intelligence quotient (IQ) and emotional quotient (EQ) through audio language models. It introduces a specialized evaluator using multi-sample feedback and reinforcement learning, along with the ChatReward-30K dataset, significantly outperforming existing evaluation models in accuracy and subjective testing across various spoken dialogue scenarios.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ spoken-dialogue + evaluation + audio-models reinforcement-learning ✓ machine-learning ✓

Notion

The article discusses the concept of spurious rewards in reinforcement learning systems, emphasizing the need to rethink training signals for more effective learning outcomes. It highlights the potential pitfalls of relying on misleading rewards that can skew the training process and suggests strategies for improving reward design.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

reinforcement-learning ✓ + training-signals + spurious-rewards + reward-design machine-learning ✓

JudgeLRM: Large Reasoning Models as a Judge

JudgeLRM introduces a novel approach to using Large Language Models (LLMs) as evaluators, particularly in complex reasoning tasks. By employing reinforcement learning with judge-wise rewards, JudgeLRM models significantly outperform traditional Supervised Fine-Tuning methods and current leading models, demonstrating superior performance in tasks that require deep reasoning.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ large-language-models + reasoning reinforcement-learning ✓ + evaluation machine-learning ✓

Thyme: Think Beyond Images

Thyme introduces a groundbreaking approach to image processing by autonomously generating and executing code for complex visual reasoning tasks. Utilizing a two-stage training strategy that combines supervised fine-tuning and reinforcement learning, along with the innovative GRPO-ATS algorithm, it effectively enhances performance in high-resolution perception.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ thyme + image-processing + visual-reasoning machine-learning ✓ reinforcement-learning ✓