Click any tag below to further narrow down your results
Links
This article explores the difficulties developers face in maintaining consistent personalities for large language models (LLMs). It highlights instances where chatbots have deviated from their intended roles and the ongoing research to improve their behavior and reliability.
The HateBenchSet is a dataset designed to benchmark hate speech detectors on content generated by various large language models (LLMs). It comprises 7,838 samples across 34 identity groups, including 3,641 labeled as hate and 4,197 as non-hate, with careful annotation performed by the authors to avoid exposing human subjects to harmful content. The dataset aims to facilitate research into LLM-driven hate campaigns and includes predictions from several hate speech detectors.