Quit Emailing Yourself

# reinforcement-learning → machine-learning

15 links tagged with all of: reinforcement-learning + machine-learning

Click any tag below to further narrow down your results

+ ai (3) + evaluation (3) + fine-tuning (2) + model-training (2) + optimization (2) + language-models (2) + large-language-models (2) + spurious-rewards (1) + feedback (1) + self-adaptation (1) + tree-search (1) + agent-deployment (1) + tooling (1) + reasoning (1) + reward-design (1)

Links

Defining Reinforcement Learning Down

The article explains reinforcement learning through a psychological lens, focusing on feedback mechanisms in both humans and computers. It outlines how computer programs learn by receiving scores, updating their responses, and emphasizes a specific approach called Reformist RL, which simplifies implementation for generative models.

Saved by tldr-importer · Last saved February 14, 2026 · 3 min read

reinforcement-learning ✓ + generative-models + optimization machine-learning ✓ + feedback

INTELLECT-3: A 100B+ MoE trained with large-scale RL

INTELLECT-3 is a Mixture-of-Experts model with over 100 billion parameters, trained using a custom reinforcement learning framework. It outperforms larger models across various benchmarks in math, code, and reasoning. The training infrastructure and datasets are open-sourced for public use and research.

Saved by tldr-importer · Last saved February 14, 2026 · 5 min read

reinforcement-learning ✓ + open-source machine-learning ✓ + model-training + ai

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

This article introduces WebGym, an extensive open-source environment for training visual web agents using nearly 300,000 tasks from real websites. It details a reinforcement learning approach that improves agent performance, achieving a notable increase in success rates on unseen tasks compared to other models.

Saved by tldr-importer · Last saved February 14, 2026 · 2 min read

+ web-agents reinforcement-learning ✓ machine-learning ✓ + tasks + evaluation

WarpGrep: Fast, Parallel Code Retrieval with RL | Morph

This article discusses WarpGrep, a model designed for efficient code search. It highlights how WarpGrep uses reinforcement learning for quick and parallel code retrieval, achieving results comparable to leading models in a fraction of the time.

Saved by tldr-importer · Last saved February 14, 2026 · 1 min read

+ code + retrieval reinforcement-learning ✓ + parallel-processing machine-learning ✓

Agentic Rubrics as Contextual Verifiers for SWE Agents

This article presents Agentic Rubrics, a method for verifying software engineering agents without executing code. By using a context-grounded checklist created by an expert agent, candidate patches are scored efficiently, providing a more interpretable alternative to traditional verification methods. The results show significant improvements in scoring compared to existing baselines.

Saved by tldr-importer · Last saved February 14, 2026 · 2 min read

+ software-engineering + verification machine-learning ✓ reinforcement-learning ✓ + agentic-rubrics

GitHub - xhyumiracle/Awesome-AgenticLLM-RL-Papers

The repository serves as a comprehensive resource for the survey paper "The Landscape of Agentic Reinforcement Learning for LLMs: A Survey," detailing various reinforcement learning methods and their applications to large language models (LLMs). It includes tables summarizing methodologies, objectives, and key mechanisms, alongside links to relevant papers and resources in the field of AI.

Saved by tldr-importer · Last saved October 29, 2025 · 7 min read

reinforcement-learning ✓ + large-language-models + agentic-llm + research-survey machine-learning ✓

GitHub - deepmodeling/CrystalFormer: Space Group Informed Transformer for Crystalline Materials Generation

CrystalFormer is a transformer-based autoregressive model tailored for generating crystalline materials while adhering to space group symmetry, enhancing data and computational efficiency. It allows for conditional generation through a structured framework, which includes reinforcement learning and Markov chain Monte Carlo methods. The model supports various functionalities such as generating specific crystal structures and evaluating their validity and novelty.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ crystal-structure machine-learning ✓ + generative-modeling reinforcement-learning ✓ + space-group

Self-Adapting Language Models

Large language models (LLMs) typically cannot adapt their weights dynamically to new tasks or knowledge. The Self-Adapting LLMs (SEAL) framework addresses this limitation by allowing models to generate their own finetuning data and directives for self-adaptation through a reinforcement learning approach, resulting in persistent weight updates and improved performance in knowledge incorporation and few-shot generalization tasks.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ self-adaptation machine-learning ✓ + language-models reinforcement-learning ✓ + fine-tuning

TreeRL: LLM Reinforcement Learning with On-Policy Tree Search

TreeRL is a novel reinforcement learning framework that integrates on-policy tree search to enhance the training of language models. By incorporating intermediate supervision and optimizing search efficiency, TreeRL addresses issues common in traditional reinforcement learning methods, such as distribution mismatch and reward hacking. Experimental results show that TreeRL outperforms existing methods in math and code reasoning tasks, showcasing the effectiveness of tree search in this domain.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

reinforcement-learning ✓ + tree-search + language-models machine-learning ✓ + optimization

Fulcrum Research

Fulcrum Research is developing tools to enhance human oversight in a future where AI agents perform tasks such as software development and research. Their goal is to create infrastructure for safely deploying these agents, focusing on improving machine learning evaluations and environments. They invite collaboration from those working on reinforcement learning and agent deployment.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ ai machine-learning ✓ + tooling reinforcement-learning ✓ + agent-deployment

[no-title]

The article discusses the process of reinforcement learning fine-tuning, detailing how to enhance model performance through specific training techniques. It emphasizes the importance of tailored approaches to improve the adaptability and efficiency of models in various applications. The information is aimed at practitioners looking to leverage reinforcement learning for real-world tasks.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

reinforcement-learning ✓ + fine-tuning + model-training machine-learning ✓ + ai

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

WavReward is a novel reward feedback model designed to evaluate spoken dialogue systems by assessing both their intelligence quotient (IQ) and emotional quotient (EQ) through audio language models. It introduces a specialized evaluator using multi-sample feedback and reinforcement learning, along with the ChatReward-30K dataset, significantly outperforming existing evaluation models in accuracy and subjective testing across various spoken dialogue scenarios.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ spoken-dialogue + evaluation + audio-models reinforcement-learning ✓ machine-learning ✓

Notion

The article discusses the concept of spurious rewards in reinforcement learning systems, emphasizing the need to rethink training signals for more effective learning outcomes. It highlights the potential pitfalls of relying on misleading rewards that can skew the training process and suggests strategies for improving reward design.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

reinforcement-learning ✓ + training-signals + spurious-rewards + reward-design machine-learning ✓

JudgeLRM: Large Reasoning Models as a Judge

JudgeLRM introduces a novel approach to using Large Language Models (LLMs) as evaluators, particularly in complex reasoning tasks. By employing reinforcement learning with judge-wise rewards, JudgeLRM models significantly outperform traditional Supervised Fine-Tuning methods and current leading models, demonstrating superior performance in tasks that require deep reasoning.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ large-language-models + reasoning reinforcement-learning ✓ + evaluation machine-learning ✓

Thyme: Think Beyond Images

Thyme introduces a groundbreaking approach to image processing by autonomously generating and executing code for complex visual reasoning tasks. Utilizing a two-stage training strategy that combines supervised fine-tuning and reinforcement learning, along with the innovative GRPO-ATS algorithm, it effectively enhances performance in high-resolution perception.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ thyme + image-processing + visual-reasoning machine-learning ✓ reinforcement-learning ✓