Quit Emailing Yourself

# benchmarks → reasoning-tasks → model-performance → power-sampling

1 link tagged with all of: benchmarks + reasoning-tasks + model-performance + power-sampling

Click any tag below to further narrow down your results

Links

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Power sampling from the base model achieves performance comparable to or surpassing RL-posttraining across various reasoning tasks, including MATH500, HumanEval, and GPQA Diamond. Notably, in-domain results for MATH500 are nearly equal to GRPO, while out-of-domain outcomes, particularly on HumanEval and AlpacaEval 2.0, show power sampling outperforming GRPO without altering the base model's weights.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

power-sampling ✓ + rl-posttraining reasoning-tasks ✓ benchmarks ✓ model-performance ✓