Skip to content

ExplainableML/in-context-impersonation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

In-Context Impersonation Reveals Large Language Models' Strengths and Biases

Paper NeurIPS
python pytorch lightning hydra Template black license

Description

This repository is the official implementation of the NeurIPS 2023 spotlight In-Context Impersonation Reveals Large Language Models' Strengths and Biases by Leonard Salewski1,2, Stephan Alaniz1,2, Isabel Rio-Torto3,4*, Eric Schulz2,5 and Zeynep Akata1,2. A preprint is available on arXiv and a poster is available on the NeurIPS website and on the project website.

1 University of Tübingen, 2 Tübingen AI Center, 3 University of Porto, 4 INESC TEC, 5 Max Planck Institute for Biological Cybernetics *Work done while at the University of Tübingen

📌 Abstract

A schematic overview over the three tasks that we evaluated in our paper. For each task (multi-armed bandit, reasoning and vision and language) we show a complete example prompt fed to the large language model as well as example outputs and how they are evaluated.

In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context. We ask LLMs to assume different personas before solving vision and language tasks. We do this by prefixing the prompt with a persona that is associated either with a social identity or domain expertise. In a multi-armed bandit task, we find that LLMs pretending to be children of different ages recover human-like developmental stages of exploration. In a language-based reasoning task, we find that LLMs impersonating domain experts perform better than LLMs impersonating non-domain experts. Finally, we test whether LLMs' impersonations are complementary to visual information when describing different categories. We find that impersonation can improve performance: an LLM prompted to be a bird expert describes birds better than one prompted to be a car expert. However, impersonation can also uncover LLMs' biases: an LLM prompted to be a man describes cars better than one prompted to be a woman. These findings demonstrate that LLMs are capable of taking on diverse roles and that this in-context impersonation can be used to uncover their hidden strengths and biases.

🚀 Installation

Conda

We exclusively use conda to manage all dependencies.

# clone project
git clone https://github.com/ExplainableML/in-context-impersonation
cd in-context-impersonation

# create conda environment and install dependencies
conda env create -f environment.yaml -n in_context_impersonation

# activate conda environment
conda activate in_context_impersonation

# download models for spacy
python3 -m spacy download en_core_web_sm

⚡ How to run

Within the paper we show three different impersonation evaluation schemes. To run those first the language models have to be prepared and valid paths need to be configured.

Configuration

For all experiments hydra is used for configuration. The main config file is configs/eval.yaml. All paths (e.g. for data, model weights, logging, caching, etc.), can be configured in configs/paths/default.yaml.

Language Model Setup

Use the instructions below to setup the language models. By default the experiments will run with Vicuna. This can be changed by passing model.llm=chat_gpt to the commands below.

Vicuna

For Vicuna please follow the instructions here to obtain HuggingFace compatible weights. Afterwards configure the path to the Vicuna weights in configs/model/llm/vicuna13b.yaml by adjusting the value of the model_path key.

ChatGPT

For ChatGPT please obtain an OpenAI API key, create a .env file in the project root and insert the key in the following format:

OPENAI_API_KEY="some_key"

Please note, that calls made to the OpenAI API will incur some costs billed towards your account.

Experiments

The following commands show how to run the experiments for the three tasks studied in our paper. Note, that in the code we sometimes use the term character for persona interchangeably.

Bandit Task

The following command can be used to run the bandit task

python src/eval.py model=bandit_otf data=bandit

which uses configs/model/bandit_otf.yaml and configs/data/bandit.yaml for further configuration.

Reasoning Task

The following command can be used to run one task of the MMLU reasoning experiment

python src/eval.py model=text_otf data=mmlu data.dataset_partial.task=abstract_algebra

which uses configs/model/text_otf.yaml and configs/data/mmlu.yaml for further configuration.

For other MMLU tasks just replace abstract_algebra with the desired task name. Task names can be found here.

Vision and Language Task

The following command can be used to run one task for the CUB dataset:

python src/eval.py model=clip_dotf data=cub

The following command can be used to run one task for the Stanford Cars dataset:

python src/eval.py model=clip_dotf data=stanford_cars

Further configuration (e.g. the list of personas) can be adjusted in configs/model/clip_dotf.yaml. The datasets can be configured in configs/data/cub.yamland configs/data/stanford_cars.yaml respectively.

📖 Citation

Please use the following bibtex entry to cite our work:

@article{Salewski2023InContextIR,
  title   = {In-Context Impersonation Reveals Large Language Models' Strengths and Biases},
  author  = {Leonard Salewski and Stephan Alaniz and Isabel Rio-Torto and Eric Schulz and Zeynep Akata},
  journal = {ArXiv},
  year    = {2023},
  volume  = {abs/2305.14930},
}

You can also find our work on Google Scholar and Semantic Scholar.

Funding and Acknowledgments

The authors thank IMPRS-IS for supporting Leonard Salewski. This work was partially funded by the Portuguese Foundation for Science and Technology (FCT) under PhD grant 2020.07034.BD, the Max Planck Society, the Volkswagen Foundation, the BMBF Tübingen AI Center (FKZ: 01IS18039A), DFG (EXC number 2064/1 – Project number 390727645) and ERC (853489-DEXIM).

This repository is based on the Lightning-Hydra template.

Intended Use

The research software in this repository is designed for analyzing the impersonation capabilities of large language models, aiding in understanding their functionality and performance. It is meant to reproduce, understand or modify the insights of the associated paper. The software is not intended for production-ready use and its limitations should be carefully evaluated before using it for such applications.

License

This repository is licensed under the MIT License.