Large Language Models (LLMs) can significantly enhance data annotation but often produce incorrect labels due to uncertainty. This work proposes a candidate annotation paradigm that encourages LLMs to provide multiple possible labels, utilizing a teacher-student framework called CanDist to distill these annotations into unique labels for downstream tasks. Experiments demonstrate the effectiveness of this method across various text classification challenges.
Language models often generate false information, known as hallucinations, due to training methods that reward guessing over acknowledging uncertainty. The article discusses how evaluation procedures can incentivize this behavior and suggests that improving scoring systems to penalize confident errors could help reduce hallucinations in AI systems.