Safety360

Welcome to the safety360/ directory! This folder contains various AI safety implementations for LLM360 models.

We currently include the following folders:

bold/ provides sentiment analysis with BOLD dataset.
toxic_detection/ measures model's capability to identify toxic text.
toxigen/ evaluate model's toxicity on text generation.
wmdp/ evaluate model's hazardous knowledge.