Quit Emailing Yourself

# reinforcement-learning → generative-models

3 links tagged with all of: reinforcement-learning + generative-models

Click any tag below to further narrow down your results

Links

Black-Box On-Policy Distillation of Large Language Models

This article introduces Generative Adversarial Distillation (GAD), a method for training student models using only teacher-generated texts. Unlike traditional knowledge distillation, GAD employs a two-player game between a generator and a discriminator, enabling effective learning without probability supervision. The results demonstrate that models trained with GAD achieve performance comparable to their larger teacher models.

Saved by tldr-importer · Last saved February 14, 2026 · 2 min read

+ knowledge-distillation generative-models ✓ reinforcement-learning ✓ + llms + gpt-5

Defining Reinforcement Learning Down

The article explains reinforcement learning through a psychological lens, focusing on feedback mechanisms in both humans and computers. It outlines how computer programs learn by receiving scores, updating their responses, and emphasizes a specific approach called Reformist RL, which simplifies implementation for generative models.

Saved by tldr-importer · Last saved February 14, 2026 · 3 min read

reinforcement-learning ✓ generative-models ✓ + optimization + machine-learning + feedback

Inference-Time Scaling for Generalist Reward Modeling

The paper explores the enhancement of reward modeling in reinforcement learning for large language models, focusing on inference-time scalability. It introduces Self-Principled Critique Tuning (SPCT) to improve generative reward modeling and proposes a meta reward model to optimize performance during inference. Empirical results demonstrate that SPCT significantly enhances the quality and scalability of reward models compared to existing methods.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

reinforcement-learning ✓ + reward-modeling + large-language-models + inference-scaling generative-models ✓