Skip to content

Latest commit

 

History

History

unlearning

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Unlearning

This folder contains implementations for machine unlearning methods on LLM360 models. Machine unlearning is a pre-deployment safety measure designed to remove hazardous knowledge from language models. Unlearned models are inherently safe, as they lack the knowledge to be misused.

Table of Contents

Overview

Here's a list of unlearning methods we have implemented so far.

Directory Structure

unlearn.py is the main entrypoint for running unlearning methods. It uses python modules in methods/ and utils/ folders.

The methods/ folder contains the implementations for unlearning methods:

  • training.py: All training loop implementations
  • utils.py: Loss functions and other method-related utils

The utils/ folder contains helper functions for model/dataset IO:

  • data_utils.py: Dataloader for text datasets
  • model_utils.py: Model IO utils

By default, unlearned models are saved to models/ folder. Please store all training datasets to the data/ folder.

Note

This project uses the bio-forget-corpus from the WMDP Benchmark for unlearning training. Access to this dataset requires a separate request. Please follow the instructions provided here to obtain the necessary permissions. By default, the dataloader is configured to load the dataset from data/bio_forget.jsonl.

Installation

  1. Clone and enter the repo:
    git clone https://github.com/LLM360/Analysis360.git
    cd Analysis360/analysis/unlearning
  2. Install dependencies:
    pip install -r requirements.txt
  3. To install lm-eval, please check the installation instructions in the metrics/harness folder.

Quick Start

Training and Evaluation

An example usage is provided in the demo.ipynb, which can be executed with a single A100 80G GPU.