The study presents Intuitor, a method utilizing Reinforcement Learning from Internal Feedback (RLIF) that allows large language models (LLMs) to learn using self-certainty as the sole reward signal, eliminating the need for external rewards or labeled data. Experiments show that Intuitor matches the performance of existing methods while achieving better generalization in tasks like code generation, indicating that intrinsic signals can effectively facilitate learning in autonomous AI systems.
reinforcement-learning ✓
intrinsic-feedback ✓
+ self-certainty
language-models ✓
unsupervised-learning ✓