The HateBenchSet is a dataset designed to benchmark hate speech detectors on content generated by various large language models (LLMs). It comprises 7,838 samples across 34 identity groups, including 3,641 labeled as hate and 4,197 as non-hate, with careful annotation performed by the authors to avoid exposing human subjects to harmful content. The dataset aims to facilitate research into LLM-driven hate campaigns and includes predictions from several hate speech detectors.