DuPO introduces a dual learning-based preference optimization framework designed to generate annotation-free feedback, overcoming limitations of existing methods such as RLVR and traditional dual learning. By decomposing a task's input into known and unknown components and reconstructing the unknown part, DuPO enhances various tasks, achieving significant improvements in translation quality and mathematical reasoning accuracy. This framework positions itself as a scalable and general approach for optimizing large language models (LLMs) without the need for costly labels.