Name		Name	Last commit message	Last commit date
parent directory ..
data/prompts		data/prompts
utils		utils
README.md		README.md
demo.ipynb		demo.ipynb
requirements.txt		requirements.txt
single_ckpt_bold_eval.py		single_ckpt_bold_eval.py

README.md

BOLD

This folder contains implementations to evaluate LLM360 models on BOLD dataset, which evaluates social biases in language models across five domains: profession, gender, race, religion, and political ideology.

Overview

The folder contains sentiment analysis for BOLD dataset. Amber and Crystal models are currently supported.

Directory Structure

single_ckpt_bold_eval.py is the main entrypoint for running BOLD evaluation on a single model. It uses python modules in utils/ folder.

The utils/ folder contains helper functions for model/dataset IO:

data_utils.py: Prompt dataset utils
model_utils.py: Model loader

The BOLD prompts are stored in ./data/prompts/. By default, the model generations are saved in ./{prompt_file_name}_with_responses.jsonl, and the evaluation results are saved in ./{model_name}_results.jsonl.

Installation

Clone and enter the folder:

git clone https://github.com/LLM360/Analysis360.git
cd Analysis360/analysis/safety360/bold

Install dependencies:
```
pip install -r requirements.txt
```

Quick Start

BOLD Evaluation

An example usage is provided in the demo.ipynb, which can be executed with a single A100 80G GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bold

bold

README.md

BOLD

Table of Contents

Overview

Directory Structure

Installation

Quick Start

BOLD Evaluation

Files

bold

Directory actions

More options

Directory actions

More options

Latest commit

History

bold

Folders and files

parent directory

README.md

BOLD

Table of Contents

Overview

Directory Structure

Installation

Quick Start

BOLD Evaluation