Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT Add ability to fetch wmdp-bio, wmdp-chem, and wmdp-cyber datasets #380

Merged
merged 54 commits into from
Sep 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
d54ab6d
Add skeleton to fetch PKU-SafeRLHF-dataset
mshirsekar1 Sep 16, 2024
2baa50c
added base pull of the data
Sep 16, 2024
2314fcc
pulling just the prompts out
Sep 16, 2024
ec6caff
create PromptDataset object
Sep 16, 2024
0cadc01
PromptDataset object created
mshirsekar1 Sep 16, 2024
94233a8
updating the args
Sep 16, 2024
4c2acdb
removed testing file
Sep 16, 2024
01fd4b6
moved the import to top of file
Sep 16, 2024
7f050eb
Merge branch 'FHL/add-PKU-SafeRLHF-dataset' of https://github.com/msh…
Sep 16, 2024
d643344
adding arg to allow user to get all prompts vs just unsafe ones
Sep 16, 2024
e6f523e
update harm categories, refactor all_items for readability
Sep 16, 2024
211206e
FEAT: dataset setup
Sep 17, 2024
3cdc7b6
FEAT: grab prompts from PKU-SafeRLHF dataset, with optional safe prom…
Sep 17, 2024
533f618
FEAT: remove unused arguments
Sep 17, 2024
3f1a054
FEAT: add skeleton for testing the dataset fetch
Sep 17, 2024
d283125
FEAT: adding in the initialization for the dataset
Sep 17, 2024
3224582
jupyter notebook format of pku_safeRLHF_testing python file
Sep 17, 2024
71ce2f6
copy-paste logic from python version
Sep 17, 2024
e6f24f1
added testing for the safe prompts included
Sep 17, 2024
1216789
Merge pull request #1 from enrajka/featadd-pku-saferlhf-dataset
enrajka Sep 17, 2024
5cae846
update dependencies with datasets library
Sep 17, 2024
a92704f
Merge branch 'main' of https://github.com/enrajka/PyRIT
Sep 17, 2024
12d7721
Update pyrit/datasets/fetch_example_datasets.py
enrajka Sep 17, 2024
b1f4052
Update pyrit/datasets/fetch_example_datasets.py
enrajka Sep 17, 2024
2828c8d
Update pyrit/datasets/fetch_example_datasets.py
enrajka Sep 17, 2024
03a127c
Added research paper and author names
enrajka Sep 17, 2024
4db8598
Update pyrit/datasets/fetch_example_datasets.py
enrajka Sep 17, 2024
4355937
update references to lowercase fn name
Sep 17, 2024
224c6db
update references to lowercase fn name
Sep 17, 2024
a21b83f
does not compile, template code to read in datasets as 3 jsosns
Sep 17, 2024
0b8b8a7
Fixed syntax in prompt dataset constructor
Sep 17, 2024
ce2df6e
updates to pass pre-commit
Sep 17, 2024
be80adb
update references for pre-commit checks to pass
Sep 17, 2024
992c5a4
move location of dataset in toml
Sep 17, 2024
e1020bb
fixed typo
Sep 17, 2024
15f372c
fixed typo v2
Sep 17, 2024
665d0fb
read in data into QuestionAnsweringDataset
Sep 17, 2024
02ab3fb
Merge branch 'Azure:main' into feat-add-wmdp-dataset
enrajka Sep 18, 2024
e420df7
refactored and added testing
Sep 18, 2024
23bc221
resolve merge conflict
Sep 18, 2024
9cf852c
Merge remote-tracking branch 'emilierajka/feat-add-wmdp-dataset' into…
Sep 18, 2024
a771ba6
updated testing
Sep 18, 2024
c85b7a9
Merge remote-tracking branch 'emilierajka/feat-add-wmdp-dataset' into…
Sep 18, 2024
90337e2
update var reference
Sep 18, 2024
95f1709
Merge remote-tracking branch 'emilierajka/feat-add-wmdp-dataset' into…
Sep 18, 2024
64dd0c6
updating the testing works properly
Sep 18, 2024
56d01a4
Merge remote-tracking branch 'emilierajka/feat-add-wmdp-dataset' into…
Sep 18, 2024
f4454fd
formatter chagnes
Sep 18, 2024
20bd2e0
Update pyrit/datasets/fetch_example_datasets.py
romanlutz Sep 18, 2024
c7d4d21
deleted original dataset, added logic to subset data by category and …
Sep 18, 2024
2490207
Merge branch 'FHL/add-wmdp-dataset' of https://github.com/mshirsekar1…
Sep 18, 2024
7a85cba
formatting finalized
Sep 18, 2024
0780bfb
updated jupyter notebook output for testing files
Sep 18, 2024
b0bb79f
Apply suggestions from code review
romanlutz Sep 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
798 changes: 780 additions & 18 deletions doc/code/orchestrators/benchmark_orchestrator.ipynb

Large diffs are not rendered by default.

53 changes: 52 additions & 1 deletion doc/code/orchestrators/benchmark_orchestrator.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,28 @@
# ## Benchmark Orchestrator

# %%
# %%
# Import necessary packages
from pyrit.orchestrator.question_answer_benchmark_orchestrator import QuestionAnsweringBenchmarkOrchestrator
from pyrit.models import QuestionAnsweringDataset, QuestionAnsweringEntry, QuestionChoice
from pyrit.prompt_target import AzureOpenAIGPT4OChatTarget
from pyrit.score.question_answer_scorer import QuestionAnswerScorer

from pyrit.datasets import fetch_wmdp_dataset
from pyrit.common import default_values

# %%
# %%
# Load environment variables
default_values.load_default_env()

# %%
# %%
# Set up the Azure OpenAI prompt target
target = AzureOpenAIGPT4OChatTarget()

# %%
# %%
# Create demo dataset for Q/A Model
qa_ds = QuestionAnsweringDataset(
name="demo dataset",
version="1.0",
Expand Down Expand Up @@ -71,17 +82,22 @@
],
)

# Create the score for the Q/A Model
qa_scorer = QuestionAnswerScorer(
dataset=qa_ds,
)

# Create the orchestrator with scorer and demo dataset
benchmark_orchestrator = QuestionAnsweringBenchmarkOrchestrator(
chat_model_under_evaluation=target, scorer=qa_scorer, verbose=True
)

# Evaluate the Q/A Model response
await benchmark_orchestrator.evaluate() # type: ignore

# %%
# %%
# Output if the results are correct
correct_count = 0
total_count = 0

Expand All @@ -93,3 +109,38 @@
correct_count += 1 if answer.is_correct else 0

print(f"Correct count: {correct_count}/{len(benchmark_orchestrator._scorer.evaluation_results)}")

# %%
# # %%
# Fetch WMDP dataset for Q/A Model Testing

wmdp_ds = fetch_wmdp_dataset()
wmdp_ds.questions = wmdp_ds.questions[:3]

# Create the score for the Q/A Model
qa_scorer_wmdp = QuestionAnswerScorer(
dataset=wmdp_ds,
)

# Create the orchestrator with scorer and demo dataset
benchmark_orchestrator_wmdp = QuestionAnsweringBenchmarkOrchestrator(
chat_model_under_evaluation=target, scorer=qa_scorer_wmdp, verbose=True
)

# Evaluate the Q/A Model response
await benchmark_orchestrator_wmdp.evaluate() # type: ignore

# %%
# %%
# Output if the results are correct
correct_count = 0
total_count = 0

for idx, (qa_question_entry, answer) in enumerate(benchmark_orchestrator_wmdp._scorer.evaluation_results.items()):
print(f"Question {idx+1}: {qa_question_entry.question}")
print(f"Answer: {answer}")
print(f"")

correct_count += 1 if answer.is_correct else 0

print(f"Correct count: {correct_count}/{len(benchmark_orchestrator_wmdp._scorer.evaluation_results)}")
2 changes: 2 additions & 0 deletions pyrit/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
fetch_seclists_bias_testing_examples,
fetch_xstest_examples,
fetch_pku_safe_rlhf_dataset,
fetch_wmdp_dataset,
)

__all__ = [
Expand All @@ -17,4 +18,5 @@
"fetch_seclists_bias_testing_examples",
"fetch_xstest_examples",
"fetch_pku_safe_rlhf_dataset",
"fetch_wmdp_dataset",
]
71 changes: 70 additions & 1 deletion pyrit/datasets/fetch_example_datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from pyrit.common.json_helper import read_json, write_json
from pyrit.common.text_helper import read_txt, write_txt
from pyrit.common.path import RESULTS_PATH
from pyrit.models import PromptDataset
from pyrit.models import PromptDataset, QuestionAnsweringDataset, QuestionAnsweringEntry, QuestionChoice

from typing import Callable, Dict, List, Optional, Literal, TextIO

Expand Down Expand Up @@ -419,3 +419,72 @@ def fetch_pku_safe_rlhf_dataset(include_safe_prompts: bool = True) -> PromptData
)

return dataset


def fetch_wmdp_dataset(category: Optional[str] = None) -> QuestionAnsweringDataset:
"""
Fetch WMDP examples and create a QuestionAnsweringDataset.

romanlutz marked this conversation as resolved.
Show resolved Hide resolved
Args:
category (str): The dataset category, one of "cyber", "bio", "chem"

Returns:
QuestionAnsweringDataset: A QuestionAnsweringDataset containing the examples.

Note:
For more information and access to the original dataset and related materials, visit:
https://huggingface.co/datasets/cais/wmdp
"""

# Determine which subset of data to load
data_categories = None
if not category: # if category is not specified, read in all 3 subsets of data
data_categories = ["wmdp-cyber", "wmdp-bio", "wmdp-chem"]
elif category not in ["cyber", "bio", "chem"]:
raise ValueError(f"Invalid Parameter: {category}. Expected 'cyber', 'bio', or 'chem'")
else:
data_categories = ["wmdp-" + category]

# Read in cybersecurity dataset
questions_answers = []
for name in data_categories:
ds = load_dataset("cais/wmdp", name)
for i in range(0, len(ds["test"])):
# For each question, save the 4 possible choices and their respective index
choices = []
for j in range(0, 4):
c = QuestionChoice(index=j, text=ds["test"]["choices"][i][j])
choices.append(c)

entry = QuestionAnsweringEntry(
question=ds["test"]["question"][i],
answer_type="int",
correct_answer=ds["test"]["answer"][i],
choices=choices,
)
questions_answers.append(entry)

dataset = QuestionAnsweringDataset(
name="wmdp",
description="""The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning. The Weapons of Mass
Destruction Proxy (WMDP) benchmark is a dataset of 4,157 multiple-choice questions surrounding hazardous
knowledge in biosecurity, cybersecurity, and chemical security. WMDP serves as both a proxy evaluation
for hazardous knowledge in large language models (LLMs) and a benchmark for unlearning methods to remove such
hazardous knowledge. To guide progress on mitigating risk from LLMs, we develop CUT, a state-of-the-art
unlearning method which reduces model performance on WMDP while maintaining general language model
capabilities.""",
author="""Nathaniel Li and Alexander Pan and Anjali Gopal and Summer Yue and Daniel Berrios and Alice
Gatti and Justin D. Li and Ann-Kathrin Dombrowski and Shashwat Goel and Long Phan and Gabriel Mukobi
and Nathan Helm-Burger and Rassin Lababidi and Lennart Justen and Andrew B. Liu and Michael Chen and
Isabelle Barrass and Oliver Zhang and Xiaoyuan Zhu and Rishub Tamirisa and Bhrugu Bharathi and Adam Khoja
and Zhenqi Zhao and Ariel Herbert-Voss and Cort B. Breuer and Andy Zou and Mantas Mazeika and Zifan Wang
and Palash Oswal and Weiran Liu and Adam A. Hunt and Justin Tienken-Harder and Kevin Y. Shih and Kemper
Talley and John Guan and Russell Kaplan and Ian Steneker and David Campbell and Brad Jokubaitis and
Alex Levinson and Jean Wang and William Qian and Kallol Krishna Karmakar and Steven Basart and Stephen
Fitz and Mindy Levine and Ponnurangam Kumaraguru and Uday Tupakula and Vijay Varadharajan and Yan
Shoshitaishvili and Jimmy Ba and Kevin M. Esvelt and Alexandr Wang and Dan Hendrycks""",
source="https://huggingface.co/datasets/cais/wmdp",
questions=questions_answers,
)

return dataset

This file was deleted.

Loading
Loading