1 link tagged with all of: dataset + reinforcement-learning + task-synthesis
Click any tag below to further narrow down your results
Links
The article presents Golden Goose, a method to create unlimited Reinforcement Learning with Verifiable Rewards (RLVR) tasks by using unverifiable internet text. It describes how the authors developed a large-scale dataset, GooseReason-0.7M, which includes over 700,000 tasks across various domains. The approach successfully enhances model performance, even in areas like cybersecurity where prior data was unavailable.