Click any tag below to further narrow down your results
Links
This article discusses BaNEL, a new algorithm that improves generative models by training them using only negative reward samples. It addresses the challenges of reward sparsity and costly evaluations in complex problem-solving scenarios, demonstrating its effectiveness through various experiments.