1 link tagged with all of: auditing + tools + safety-research + machine-learning
Click any tag below to further narrow down your results
Links
Petri is a tool designed for alignment auditing that facilitates rapid hypothesis testing by autonomously creating environments and conducting multi-turn audits using human-like messages. It allows researchers to evaluate models quickly and efficiently, surfacing concerning behaviors while emphasizing responsible usage to avoid harmful content generation. The tool supports local development and customization through API keys and offers command-line interface options for various model roles.