Quit Emailing Yourself

2 links tagged with all of: dataset + reinforcement-learning

Click any tag below to further narrow down your results

Links

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

The article presents Golden Goose, a method to create unlimited Reinforcement Learning with Verifiable Rewards (RLVR) tasks by using unverifiable internet text. It describes how the authors developed a large-scale dataset, GooseReason-0.7M, which includes over 700,000 tasks across various domains. The approach successfully enhances model performance, even in areas like cybersecurity where prior data was unavailable.

Saved by tldr-importer · Last saved February 14, 2026 · 2 min read

reinforcement-learning ✓ dataset ✓ + language-models + task-synthesis + cybersecurity

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

Mini-o3 introduces an advanced system that enhances tool-based interactions for visual reasoning by supporting deep, multi-turn reasoning and achieving state-of-the-art performance on visual search tasks. The system utilizes a novel over-turn masking strategy to effectively manage response lengths during reinforcement learning, combined with a comprehensive dataset designed for exploratory reasoning. Open-source code and models are provided to facilitate reproducibility and further research.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ visual-search + multimodal reinforcement-learning ✓ + open-source dataset ✓