Welcome to the safety360/
directory! This folder contains various AI safety implementations for LLM360 models.
We currently include the following folders:
bold/
provides sentiment analysis with BOLD dataset.toxic_detection/
measures model's capability to identify toxic text.toxigen/
evaluate model's toxicity on text generation.wmdp/
evaluate model's hazardous knowledge.